Electrical Engineering
      and Computer Sciences

Electrical Engineering and Computer Sciences

COLLEGE OF ENGINEERING

UC Berkeley

   

Research Projects

Object Category Recognition Using Probabilistic Fusion of Speech and Image Classifiers

Trevor Darrell and Kate Saenko1

Multimodal scene understanding is an integral part of human-robot interaction (HRI) in situated environments. Especially useful is category-level recognition, where the the system can recognize classes of objects of scenes rather than specific instances (e.g., any chair vs. this particular chair.) Humans use multiple modalities to understand which object category is being referred to, using one modality to disambiguate the information contained in the others. We address the problem of fusing visual and acoustic information to predict object categories, when an image of the object and speech input from the user is available to the HRI system. Using probabilistic decision fusion, we have shown improved classification rates, compared to using either modality alone.

[1]
Kate Saenko and Trevor Darrell, Object Category Recognition Using Probabilistic Fusion of Speech and Image Classifiers, Proc. MLMI 2007

1MIT