Discriminative Features for Large Vocabulary Speech Recognition
Arlo Faria and Nelson Morgan
Large-vocabulary speech recognition can be improved by using discriminative features produced with a multi-layer perceptron (MLP) that classifies phones based on a local acoustic context. We have found that performance can be further improved by preparing the training data with idealized features, using forward-backward alignment with hidden Markov models corresponding to the reference word transcriptions. Additionally, we have substantially decreased MLP training times by sampling the training data such that all phone classes are nearly uniformly distributed. Future work will explore MLP structures that process many streams of information, possibly exploiting massively parallel computing with specialized hardware.