Auditory researchers believe that the human auditory system computes many different representations of sound, reflecting different time and frequency resolutions. However, automatic speech recognition systems tend to be based on a single representation of the short-term speech spectrum.
We are attempting to improve the robustness of automatic speech recognition systems by using a set of two-dimensional Gabor filters with varying extents in time and frequency and varying ripple rates to analyze a spectrogram [1]. These filters have some characteristics in common with the responses of neurons in the auditory cortex of primates, and can also be seen as two-dimensional frequency analyzers.
Promising results have been obtained in a noisy digit recognition task [2], especially when this analysis method was combined with more conventional analysis. Work is ongoing in the use of this approach for larger-vocabulary recognition tasks, and in the use of the Gabor filters in a multi-stream, multi-classifier architecture.