Gabor Filter Analysis for Automatic Speech Recognition

David Gelbart and Michael Kleinschmidt1
(Professor Nelson H. Morgan)
Deutsche Forschungsgemeinschaft, Natural Sciences and Engineering Research Council of Canada, and German Ministry for Education and Research

Auditory researchers believe that the human auditory system computes many different representations of sound, reflecting different time and frequency resolutions. However, automatic speech recognition systems tend to be based on a single representation of the short-term speech spectrum.

We are attempting to improve the robustness of automatic speech recognition systems by using a set of two-dimensional Gabor filters with varying extents in time and frequency and varying ripple rates to analyze a spectrogram [1]. These filters have some characteristics in common with the responses of neurons in the auditory cortex of primates, and can also be seen as two-dimensional frequency analyzers.

Promising results have been obtained in a noisy digit recognition task [2], especially when this analysis method was combined with more conventional analysis. Work is ongoing in the use of this approach for larger-vocabulary recognition tasks, and in the use of the Gabor filters in a multi-stream, multi-classifier architecture.

[1]
M. Kleinschmidt, "Improving Word Accuracy with Gabor Feature Extraction," Forum Acusticum, Seville, Spain, September 2002.
[2]
M. Kleinschmidt and D. Gelbart, "Spectro-Temporal Gabor Features as a Front End for Automatic Speech Recognition," Int. Conf. Spoken Language Processing, Denver, CO, September 2002.
1Outside Adviser (non-EECS), University of Oldenburg

More information (http://www.icsi.berkeley.edu/~gelbart) or

Send mail to the author : (gelbart@eecs.berkeley.edu)


Edit this abstract