Electrical Engineering
      and Computer Sciences

Electrical Engineering and Computer Sciences

COLLEGE OF ENGINEERING

UC Berkeley

   

2008 Research Summary

Use of High-Level Features for Speaker Recognition

View Current Project Information

Howard Hao Lei and Nelson Morgan

It has been shown that idiolectal content in human speech (i.e., phones and words), when used as features alone, hold surprisingly good speaker discriminative power [1,2]. We extend the state-of-the-art in speaker recognition by using individual words to constrain sequences of phones (phone N-grams) in human speech, such that only phone sequences (phone N-grams) corresponding to a selected set of words are utilized as features for speaker recognition. Each word thus performs differently in how well the phone N-grams corresponding to that word discriminate between speakers, as different speakers may use different phonetic pronounciations of that word. Each word thus acts as a separate system, and word systems can be combined at the feature and score levels. The best approach results from feature-level combination of 52 word systems with phone N-grams of order 1, 2, and 3 as features. A support vector machine with a linear kernel is used for training and testing [3]. Our system achieves a 17.7% improvement compared to a non word-conditioned phone N-grams system [2]. The system also achieves a 17.5% improvement compared to a non word-conditioned phone N-grams system when combined with a GMM-based system, suggesting that the information is more complementary. The system achieves a 5% equal error rate (EER) as a standalone, and a 3.3% EER in combination with the phone N-gram and GMM-based systems. All results are achieved on NIST's 2005 evaluation corpus.

[1]
G. Doddington, "Speaker Recognition Based on Idiolectal Differences between Speakers," Proc. Eurospeech, 2001, pp. 2521-2524.
[2]
A. O. Hatch, B. Peskin, and A. Stolcke, "Improved Phonetic Speaker Recognition Using Lattice Decoding," Proc. ICASSP, Vol. 1, March 2005, pp. 169-172.
[3]
W. M. Campbell, J. P. Campbell, D. A. Reynolds, D. A. Jones, and T. R. Leek, "Phonetic Speaker Recognition with Support Vector Machines," Advances in Neural Information Processing Systems 16, 2004.