Electrical Engineering
      and Computer Sciences

Electrical Engineering and Computer Sciences


UC Berkeley


2008 Research Summary

Learning Structured Models for Phone Recognition

View Current Project Information

Slav Orlinov Petrov, Adam David Pauls and Daniel Klein

Modern speech recognition systems are very complex. A good model must account for the context-sensitive, time-dependent, and speaker-dependent nature of "phones," the basic phonological units of speech. Typically, this variation is manually encoded in the model using domain knowledge, or not modeled at all.

We present a maximally streamlined approach to learning HMM-based acoustic models for automatic speech recognition. In our approach, an initial monophone HMM is iteratively refined using a split-merge EM procedure which makes no assumptions about subphone structure or context-dependent structure, and which uses only a single Gaussian per HMM state. Despite the much simplified training process, our acoustic model achieves state-of-the-art results on phone classification (where it outperforms almost all other methods) and competitive performance on phone recognition (where it outperforms standard CD triphone / subphone / GMM approaches). We also present an analysis of what is and is not learned by our system.

S. Petrov, A. Pauls, and D. Klein, "Learning Structured Models for Phone Recognition," Proceedings of EMNLP-CoNLL, 2007.