Speaker Recognition Approaches Based on Word-Conditioning
We examine the effectiveness of various speaker recognition approaches based on keyword constraining, in which only portions of speech utterances falling under particular keywords are used to construct speaker recognition systems. Keywords are examined for their individual and combined effectiveness for a keyword HMM approach, a supervector keyword HMM approach, a keyword phone N-grams approach, and a keyword phone HMM approach. We demonstrate the effectiveness of acoustic features and importance of keyword frequency in our results. We also demonstrate the power of SVMs, in conjunction with acoustic features, in keyword combination experiments in which the supervector keyword HMM approach outperforms other keyword-based approaches and achieves an improvement over a cepstral GMM approach  on NIST's Speaker Recognition Evaluation 2006 data.
Various established techniques for pre- and post-processing of speech data are also examined. Techniques include cepstral feature warping, nuisance attribute projection , and within-class covariance speaker normalization . All of these techniques lead to improvements of our speaker recognition systems.
- S. Kajarekar, L. Ferrer, A. Venkataraman, K. Sonmez, E. Shriberg, A. Stolcke, and R. R. Gadde, "Speaker Recognition Using Prosodic and Lexical Features," Proc. of the IEEE Automatic Speech Recognition and Understanding Workshop, 2003, pp. 19-24.
- W. Campbell, D. Sturim, D. Reynolds, and A. Solomonoff, "SVM-Based Speaker Verification Using a GMM Supervector Kernel and NAP Variability Compensation," Proc. of ICASSP, Vol. 1, 2006, pp. 97-100.
- A. Hatch, S. Kajarekar, and A. Stolcke, "Within-Class Covariance Normalization for SVM-based Speaker Recognition," Proc. of ICSLP-Interspeech, 2006.