Using Laughter in Speaker Recognition
Mary Tai Knox, Nikki Mirghafori1 and Nelson Morgan
Audio communication contains a wealth of information in addition to spoken words. Specifically, laughter provides cues regarding the emotional state of the speaker, topic changes in a conversation, and the speaker's identity. Currently, our goal is to develop an automatic speaker recognition system which relies on features from laughter segments.
Since most speaker recognition datasets do not consistently transcribe laughter, we need to first build an automatic laughter segmenter. Previously, we used neural networks trained with short-term features (including Mel-cepstral coefficients, pitch, and energy) to compute the probability that each frame was laughter. This system had an 8% equal error rate (EER). While the EER was quite promising we found that within a laughter segment, the output probability varied more than desired causing the system to classify sequential frames as both laughter and non-laughter. We are currently working to improve our results such that the audio is more consistently marked with a single start and stop time for each laughter segment.
1International Computer Science Institute