Active Capture

Active Capture: Goal Directed System Direction of Human Action

"Active Capture" is a paradigm in multimedia computing and applications that brings together capture, interaction, and processing and exists in the intersection of these three capabilities. Most current human-computer interfaces largely exclude media capture and exist at the intersection of interaction and processing. In order to incorporate media capture into an interaction without requiring signal processing that would be beyond current capabilities, the interaction must be designed to leverage context from the interaction. For example, if the system wants to take a picture of the user smiling, it can interact with the user to get them to face the camera and smile and use simple, robust parsers (such as an eye finder and mouth motion detector) to aid in the contextualized capture, interaction, and processing. From the computer vision and audition side, Active Capture applications are high context multimedia recognizers. They augment computer vision and audition parsers and recognizers with context from the interaction with the user.

Related Project: ACAL

Publications

Ana Ramírez Chang and Marc Davis. "Active Capture Design Case Study: SIMS Faces." In Proceedings of Conference on Designing for User eXperience (DUX 2005) in San Francisco, California, Forthcoming 2005. paper video
	We present a design case study for the SIMS Faces application. The SIMS Faces application is an Active Capture application that works with the user to take her picture and record her saying her name for inclusion on the department web page. Active Capture applications are systems that capture and direct human action by working with the user, directing her and monitoring her progress, to complete a common goal, in this case taking her picture when she is smiling and looking at the camera. In addition to producing a working Active Capture application, the project also included studying the design of Active Capture applications. The team conducted an ethnographic study to inform the design of the interaction with the user, prototyped a set of tools to support the design process, and iterated a design process involving bodystorming, a Wizard-of-Oz study, the prototyped tools, and a user test of the implemented application.

Ana Ramírez Chang and Marc Davis. "Designing Systems that Direct Human Action." In Proceedings of CHI 2005, Conference on Human Factors in Computing Systems (2005). paper
	In this paper we present a user-centered design process for Active Capture systems. These systems bring together techniques from human-human direction practice, multimedia signal processing, and human-computer interaction to form computational systems that automatically analyze and direct human action. The interdependence between the design of multimedia signal parsers and the user interaction script presents a unique challenge in the design process. We have developed an iterative user-centered design process for Active Capture systems that incorporates bodystorming, wizard-of-oz user studies, iterative parser design, and traditional user studies, based on our experience designing a portrait camera system that works with the user to record her name and take her picture. Based on our experiences, we lay out a set of recommendations for future tools to support such a design process.

Ana Ramírez and Marc Davis. “Active Capture and Folk Computing.” In Proceedings of IEEE International Conference on Multimedia and Expo (ICME 2004) Special Session on Folk Information Access Through Media in Taipei, Taiwan, IEEE Computer Society Press, 2004. paper presentation
	The domains of folk computing applications touch on areas of interest to people around the world but are of pressing need to those in the developing world who often lack access to basic services and rights: espe-cially health care, education, nutrition, and protection of human rights. In this paper we describe how a new paradigm for media capture, called Active Capture, and toolkit support for creating applications of this type work toward supporting the development of multimedia applications and interfaces for folk computing.

Jeffrey Heer, Nathaniel S. Good, Ana Ramírez, Marc Davis, and Jennifer Mankoff. "Presiding Over Accidents: System Direction of Human Action." In Proceedings of CHI 2004, Conference on Human Factors in Computing Systems (2004) paper video
	As human-computer interaction becomes more closely modeled on human-human interaction, new techniques and strategies for human-computer interaction are required. In response to the inevitable shortcomings of recognition technologies, researchers have studied mediation: interaction techniques by which users can resolve system ambiguity and error. In this paper we approach the human-computer dialogue from the other side, examining system-initiated direction and mediation of human action. We conducted contextual interviews with a variety of experts in fields involving human-human direction, including a film director, photographer, golf instructor, and 911 operator. Informed by these interviews and a review of prior work, we present strategies for directing physical human action and an associated design space for systems that perform such direction. We illustrate these concepts with excerpts from our interviews and with our implemented system for automated media capture or “Active Capture,” in which an unaided computer system uses techniques identified in our design space to act as a photographer, film director, and cinematographer.

Marc Davis, Jeffrey Heer and Ana Ramírez. "Active Capture: Automatic Direction for Automatic Movies (Demonstration Description)." In Proceedings of 11th Annual ACM International Conference on Multimedia in Berkeley, California, ACM Press, 88-89, 2003. description video
	The Active Capture demonstration is part of a new computational media production paradigm that transforms media production from a manual mechanical process into an automated computational one that can produce mass customized and personalized media integrating video of non-actors. Active Capture leverages media production knowledge, computer vision and audition, and user interaction design to automate direction and cinematography and thus enables the automatic production of annotated, high quality, reusable media assets. The implemented system automates the process of capturing a non-actor performing two simple reusable actions (“screaming” and “turning her head to look at the camera”) and automatically integrates those shots into various commercials and movie trailers.

Class Project: Multimedia Information (IS246) Spring'03 - Ana Ramírez and Ka-Ping Yee. Active Capture Visual Language. Paper
	This paper presents and explains the design of the Active Capture Automation Language (ACAL), a declarative constraint-based programming language for controlling automated media capture systems. We will begin by providing some background on the concept of Active Capture and describing the vision of reusable media it supports. Then we will justify the need for a specialized programming language and situate ACAL in its role within an Active Capture system architecture. This will set the stage for discussing how ACAL’s features meet particular design challenges, and defining ACAL in detail. Finally, we will look at the existing languages we considered and the other languages we designed in pursuit of this problem, to give some idea of the path we took in our design thinking.

Presentations:

Current Work presented at the Berkeley Institute of Design / Group for User Interface Research seminar - 3 Nov 2005. ppt

Example Active Capture Applications

Implemented Active Capture Applications

Kiosk Demo (video)

The Kiosk Demo is similar to a photo kiosk in the mall, but instead of taking the user’s picture, it takes a few videos of the user, and automatically creates a personalized commercial or movie trailer staring the user. There are two parts in the Kiosk Demo, the Active Capture part works with the user to capture a shot of her looking at the camera and screaming, and a shot of her turning her head to look at the camera. The second part of the Kiosk Demo uses Adaptive Media technology described in [Dav03c, DL96]. The shots of the user screaming and turning her head are automatically edited into a variety of commercials and movie trailers including a 7up commercial, an MCI commercial, and the Terminator II movie trailer.

SIMS Faces (video)

The SIMS Faces application works with the user to achieve two goals, take her picture, and record her saying her name.