Advances in vision, speech, and sensor technologies in the recent years have opened up a number of new and exciting avenues for ubiquitous computing research. One emerging application in ubicomp is Smart Spaces or Intelligent environments, the goal of which is to augment our conventional living and working spaces with computing and inference power in a way that would enable them to be more aware or cognizant of changes, particularly in relation to the activities and behaviors of their occupants. Based on this knowledge, the physical world can learn to adapt to better accommodate to people's diverse needs.
Smart Space is not a single application -- it is a platform on which a vast number of applications can be built and tested. The canonical example is automatic adjustment of room conditions, such as for lighting, temperature, or stereo volume, based on detection of user identity and prior knowledge of their preferences. Smart Space technology has many applications in the health domain. One such example is sensing and monitoring the well-being of its occupants (espcially for the elderly) and alert relevant personnel in case of emergencies. In many ways, Smart Space can be likened to the house butler, albiet it is invisible to the occupants.
At the BiD, we are in the process of creating a smart work space that ultilizes advanced speech, vision, and sensing technologies. One critical open issue to the success of this class of applications is their ability to predict the nature of the events that they "observe" with a high degree of accuracy. We approach this issue from a HCI standpoint by leveraging contextual information into the event inference system. Context which includes information regarding people's identity and the activities in which they are engaged shall play a critical role in making sensible and accurate prediction about external events.
Speaker localization and identification are two important smart room technologies that would allow us to gather information to construct location-based contexts. We are investigating the use of large microphone arrays for estimating and tracking the spatial locations of active sound sources in a lab environment.
A set of 36 dynamic, unidirectional microphones are arranged into a 6 by 6 grid to capture speech signals. The data are channeled by 828mkII firewire audio interfaces to the processing subsystem.
The position of active sound sources are computed by running a combination of digital signal processing techniques and numerical algorithms on the raw speech signals. The movement of sound sources and their relative intensities are visualized using a Java applet.
We plan to extend the system with the ability to distinguish different "speakers" from the intermingled speech data. This would allow us to construct a clear picture on which areas of the room people spend most of their time in. This information could then be used to inform design to better accomodate the needs of users.
Coming soon!
Coming soon!