Speaker Diarization for Broadcast News
Yan Huang and Nelson Morgan
Speaker diarization , also known as the "who spoke when?" task, is designed to segment audio into speaker homogeneous regions. It allows speakers to be tracked, audio to be indexed and retrieved, and conversation to be followed. It also aids speech recognition by facilitating speaker adaptation techniques. Speaker diarization is a very important step towards intelligent machine hearing.
The challenges of speaker diarization in the broadcast news domain come from the existence of multi-channel speech, overlapped speech, and short speech segments in interactive interview, plus background music, commercials, etc. We adopt the spectral clustering  paradigm and formalize the speaker segmentation as a graph partition problem. We are trying to answer the open issues in spectral clustering such as: how to estimate number clusters, how to set scaling parameters, etc., in this specific application domain. We are also exploiting speaker features, such as speaker anchor models .
- D. Reynolds and P. Torres-Carrasquillo, "Approaches and Applications of Audio Diarization," Proc. IEEE ICASSP, 2005.
- F. R. Bach and M. I. Jordan, "Learning Spectral Clustering," Neural Info. Processing Systems 16 (NIPS), 2003.
- D. E. Sturim, D. A. Reynolds, E. Singer, and J. P. Campbell, "Speaker Indexing in Large Audio Databases Using Anchor Models," Proc. IEEE ICASSP, 2001, pp. 429-432.