Electrical Engineering
      and Computer Sciences

Electrical Engineering and Computer Sciences


UC Berkeley


2008 Research Summary

Speaker Diarization for Broadcast News

View Current Project Information

Yan Huang and Nelson Morgan

Speaker diarization [1], also known as the "who spoke when?" task, is designed to segment audio into speaker homogeneous regions. It allows speakers to be tracked, audio to be indexed and retrieved, and conversation to be followed. It also aids speech recognition by facilitating speaker adaptation techniques. Speaker diarization is a very important step towards intelligent machine hearing.

The challenges of speaker diarization in the broadcast news domain come from the existence of multi-channel speech, overlapped speech, and short speech segments in interactive interview, plus background music, commercials, etc. We adopt the spectral clustering [2] paradigm and formalize the speaker segmentation as a graph partition problem. We are trying to answer the open issues in spectral clustering such as: how to estimate number clusters, how to set scaling parameters, etc., in this specific application domain. We are also exploiting speaker features, such as speaker anchor models [3].

D. Reynolds and P. Torres-Carrasquillo, "Approaches and Applications of Audio Diarization," Proc. IEEE ICASSP, 2005.
F. R. Bach and M. I. Jordan, "Learning Spectral Clustering," Neural Info. Processing Systems 16 (NIPS), 2003.
D. E. Sturim, D. A. Reynolds, E. Singer, and J. P. Campbell, "Speaker Indexing in Large Audio Databases Using Anchor Models," Proc. IEEE ICASSP, 2001, pp. 429-432.