CS 377B: Special Topics in Human Computer Interface; Machine Perception for Human Computer Interface.

Instructor: Trevor Darrell, Interval Resarch. (trevor @ interval.com) 842-6015

Meeting time/place (spring '97): Thursdays 3pm-6pm, Gates 359

Interacting with automated systems is a routine aspect of daily life, yet existing interfaces to computer systems typically fail to respect the basic dynamics of interpersonal communication. They ignore the natural interface modalities people use--body language, pose, expression, and gestures--and as a result are often found to be awkward or unpleasant. This seminar will explore the use of machine perception techniques to build computer interfaces that are no longer deaf and blind to their users, creating interfaces which can directly perceive a users' state and respond accordingly.

Course Goals

Survey machine perception literature on techniques which analyze signals from human users. Main emphasis is on visual processing for face, hand, and body tracking. Also examine robust acoustic techniques, multimodal processing, and electric field sensing.
Critically evaluate assumptions and performance of these methods on HCI tasks
Identify applications where assumptions of perception algorithm are likely to be valid and performace adequate; explore prototypes of most feasible of these in class projects.

Recommended Background: Some previous exposure to image or signal processing, statistics, and discrete math will be assumed during presentations and discussion.

Required preparation: Each week, 3-4 papers will be assigned, and 1-2 will be optional.

Students will be responsible for (co)presenting a short overview of one assigned paper or web site per meeting, and leading a discussion of its merits.

A short summary of each paper is due via email before noon on the day of class, with a one or two paragraph response to each of the following questions:

What is the scope of reported method and its major limitations?
What extensions are proposed or are obvious? (or non-obvious!)
What applications are suggested and how does the method perform on them?
What other applications seem appropriate for the method?

Additionally, students are encouraged to explore the web sites listed for each class, as well as the web sites associated with the paper authors or the topic of the day, and submit an annotated link page to relevant sites on the web.

Class Project: (required of those taking CS377B for credit) can be implementation/testing of existing method for a HCI application, a new algorithm or theory development, or a survey paper covering in detail results in a particular topic area. One page project proposals due by 4/24; final presentation on last day of class (5/29).

Grading: 50% in-class participation and written summaries; 50% project.

Class structure:

overview 20min
paper presentations 60min
break 10min
demo / web presentations 30min
discussion and application brainstorming 30min

Syallabus (tentative)

4/3 INTRODUCTION (short class, ends 4pm -- if possible also go to ALEX WAIBEL's PCD Seminar, Friday April 4, 12:30-2pm, Gates B03)

4/10 FACE RECOGNITION - INTENSITY METHODS

Kirby, M., and Sirovich, L., Application of the Karhunen-Loeve Procedure for the Characterization of Human Faces, IEEE Trans. PAMI 12:103-108, Jan 1990. (optional)
Turk, M., & Pentland, A., Eigenfaces for recognition. Journal of Cognitive Neuroscience, 3(1), 71-86. 1991.
Poggio, T., & Sung, K.K., Example-based learning for view-based human face detection. Proceedings of the ARPA IU Workshop '94, II:843-850. 1994.
Rowley, H., Baluja, S., and Kanade, T., Neural Network-Based Face Detection, Proc. IEEE Conf. Computer Vision and Pattern Recognition, CVPR-96, pp. 203-207,. IEEE Computer Society Press. 1996. (optional)
Belhumeur, P., Hespanha, J., and Kriegman, D., Eigenfaces vs. Fisherfaces: Recognition using Class Specific Linear Projection, Proc. European Conf. Computer Vision, ECCV-96, 1996.

Links:

CMU facedetector page
Demo (if SGI Indy is available): "FaceIt" ScreenLock

4/17 FACE RECOGNITION - DEFORMABLE MODELS

Papers:

Yuille, A., Cohen, D.S., and Halliman, P., Feature Extraction from Faces using Deformable Templates, Intl. Journal Computer Vision, Vol 8. pp 104-109, 1992.
Wiskott, L., Fellous, J., Kruger, N., von der Malsburg, C., "Face Recognition and Gender Determination", Proc. Intl. Workshop on Automatic Face and Gesture Recognition, pp. 92-97, Zurich, 1995.
Lanitis, A., Taylor, C., Cootes, T., A Unified Approach to Coding and Interpreting Face Images, Proc. Fifth Intl. Conf. on Computer Vision, ICCV'95, pp. 368-373, IEEE Computer Society Press, 1995.
Moghaddam, B., Nastar, C., and Pentland, A., Bayesian Face Recognition using Deformable Intensity Surfaces, Proc. Conf. Computer Vision and Pattern Recognition, CVPR'96, pp. 638-645, IEEE Computer Society Press, 1996 (optional)
Beymer, D.,, Feature Correspondence by Interleaving Shape and Texture Computations, Proc. Conf. Computer Vision and Pattern Recognition, CVPR'96, pp. 921-928, IEEE Computer Society Press, 1996 (optional)

Links:

MIT photobook/feret page

4/24 EXPRESSIONS AND MOTION TRACKING

Papers:

Terzopoulos, D. & Waters, K. (1990). Analysis of facial images using physical and anatomical models. Proceedings of the International Conference on Computer Vision, 1990, 727-732. (optional)
Essa , I., Darrell, T., & Pentland, A., Tracking Facial Motion, Proc. IEEE Nonrigid and Articulated Motion Workshop, 1995
Black, M., and Yacoob, Y., Tracking and Recognizing Rigid an Non-Rigid Facial Motions using Local Parametric Models of Image Motion, Proc. Intl. Conf. Compter Vision ICCV'95, pp. 374-381, 1995
Hager, G. & Belheumer, P., Real-time Tracking of Image Regions with Changes in Geometery and Illumination, Proc. CVPR-96, pp. 403-410, 1996

Links:

Demo (if SGI Indy is availble): Yale Vision Group's XVision real-time expression tracking)

5/1 HANDS AND GESTURES

Ahmad S., A Usable Real-Time 3D Hand Tracker, 28th Asilomar Conference on Signals, Systems and Computers, IEEE Computer Society Press, 1995.
Freeman, W., T., and Weissman, C., Television control by hand gestures, Proc. Intl. Workshop on Automatic Face and Gesture Recognition, pp. 179--183, Zurich, 1995. (optionally also see related paper on Orientation Histograms for Gesutre Recognition in same proceedings.)
Wilson, A., & Bobick, A., A State-based Technique for the Summarization and Recognition of Gesture, Proc. Intl. Conf. Compter Vision, ICCV'95, pp. 374-381, IEEE Computer Society Press. 1995 (optionally also see related WCV'95 paper)
Rehg, J., & Kanade, T., Model-based tracking of self-occluding articulated objects. In Proc. of Intl. Conf. on Computer Vision, ICCV'95, pp. 612-617, IEEE Computer Society Press. (or see longer technical report), 1995.

5/8 BODY TRACKING AND ACTIVITY DETECTION

Darrell, T., Maes, P., Blumberg, B., and Pentland, A., A Novel Environment for Situated Vision and Behavior, Workshop on Visual Behaviors, CVPR-94, IEEE Computer Society Press, 1994.
Kahn, R., Swain, M., Prokopowicz, P., and Firby, R., Gesture Recognition using the Perseus Architecture, (or html) Proc. Conf. Computer Vision and Pattern Recognition, CVPR'96, pp. 734-741, IEEE Computer Society Press, 1996
Polana R., and Nelson, R., Low Level Recognition of Human Motion, IEEE Computer Society Workshop on Motion of Nonrigid and Articulate Objects, Austin, TX, October 1994.
Zimmerman, T., Smith, J, Paradiso, J., Allport, D., and Gershenfeld, N., Applying Electric Field Sensing to Human-Computer Interfaces (or html), Proceedings CHI-95, ACM., 1995.

Links:

MIT Media Lab Perceptual Computing Group's KidsRoom project
MIT Media Lab Physics and Media Group's Personal Radar project

5/15 SPEECH AND ACOUSTIC METHODS SAMPLER:

Rabiner, L., A Tutorial on Hidden Markov Models and Selected Application in Speech Recognition, Proceedings IEEE, 77(2), 257-286, 1989.
Van Immerseel, L. M., and Martens, J.-P., Pitch and voiced/unvoiced determination with an auditory model, J. Acoust. Soc. Am, 91(6), June 1992.
Duda, R., Connectionist Models for Auditory Scene Analysis, in Advances in Neural Info. Proc. Systems 6, pp. 1069-1076, Morgan Kauffman, 1994.
Wang, H., and Chu, P,. Voice Source Localization for Automatic Camera Pointing System in Videoconferencing, Proc ICASSP-97, pp. 187-190, April 1997.

Links:

Duda, R, Sound Localization Research

5/22 MULTIMODAL SPEECH RECOGNITION

Papers:

Stork, D., Speech Reading: An overview of image processing, feature extraction, sensory integration and pattern recognition techniques, 2nd Intl. Conf. on Automatic Face and Gesture Recognition, pp. xvi-xxvi,. Oct. 1996.
Pentland, A. P., and Mase, K., Lip reading: Automatic visual recognition of spoken words, Proceedings Image Understanding and Machine Vision, vol. 14, pp. 124-127, and MIT Media Lab TR-117, Jan 1989.
P.Duchnowski, M .Hunke, D. Büsching, U. Meier and A. Waibel, Toward Movement-Invariant Automatic Lipreading and Speech Recognition, Proc. Intern. Conference on Acoustics, Speech and Signal Procesing 1995 (for underlying method see also earlier "see me, hear me" paper)
Bregler, C., Omohundro, S., Nonlinear Manifold Learning for Visual Speech Recognition, Proc.Fifth Intl. Conf. Computer Vision,, June 1995.

AFFECTIVE / EMOTIONAL PROCESSING

Papers:

A.R. Demasio, Descartes' Error: Emotion, Reason and the Human Brain,New York: Gosset/Putnam Press, 1994. (excerpts provided).
Picard, R., Affective Computing, MIT Media Lab TR-321, November 1995. (abbreviated version of forthcoming MIT Press book.)

Links:

Human Affect Sensing

5/29 NO CLASS (PROJECTS DUE BY END OF QUARTER)

(Last Modified 5/7/97, Trevor Darrell)