CS 377B: Special Topics in Human Computer Interface; Machine Perception
for Human Computer Interface.
Instructor: Trevor Darrell, Interval Resarch.
(trevor @ interval.com) 842-6015
Meeting time/place (spring '97): Thursdays 3pm-6pm, Gates 359
Interacting with automated systems is a routine aspect of daily life,
yet existing interfaces to computer systems typically fail to respect the
basic dynamics of interpersonal communication. They ignore the natural
interface modalities people use--body language, pose, expression, and gestures--and
as a result are often found to be awkward or unpleasant. This seminar will
explore the use of machine perception techniques to build computer interfaces
that are no longer deaf and blind to their users, creating interfaces which
can directly perceive a users' state and respond accordingly.
Course Goals
-
Survey machine perception literature on techniques which analyze signals
from human users. Main emphasis is on visual processing for face, hand,
and body tracking. Also examine robust acoustic techniques, multimodal
processing, and electric field sensing.
-
Critically evaluate assumptions and performance of these methods on HCI
tasks
-
Identify applications where assumptions of perception algorithm are likely
to be valid and performace adequate; explore prototypes of most feasible
of these in class projects.
Recommended Background: Some previous exposure
to image or signal processing, statistics, and discrete math will be assumed
during presentations and discussion.
Required preparation: Each week, 3-4 papers will
be assigned, and 1-2 will be optional.
Students will be responsible for (co)presenting a short overview of
one assigned paper or web site per meeting, and leading a discussion of
its merits.
A short summary of each paper is due via email before noon on the day
of class, with a one or two paragraph response to each of the following
questions:
-
What is the scope of reported method and its major limitations?
-
What extensions are proposed or are obvious? (or non-obvious!)
-
What applications are suggested and how does the method perform on them?
-
What other applications seem appropriate for the method?
Additionally, students are encouraged to explore the web sites listed for
each class, as well as the web sites associated with the paper authors
or the topic of the day, and submit an annotated link page to relevant
sites on the web.
Class Project: (required of those taking CS377B for credit)
can
be implementation/testing of existing method for a HCI application, a new
algorithm or theory development, or a survey paper covering in detail results
in a particular topic area. One page project proposals due by 4/24; final
presentation on last day of class (5/29).
Grading: 50% in-class participation and written summaries; 50%
project.
Class structure:
-
overview 20min
-
paper presentations 60min
-
break 10min
-
demo / web presentations 30min
-
discussion and application brainstorming 30min
Syallabus (tentative)
4/3 INTRODUCTION (short class, ends 4pm -- if
possible also go to ALEX
WAIBEL's PCD Seminar, Friday April 4,
12:30-2pm, Gates B03)
4/10 FACE RECOGNITION - INTENSITY METHODS
Papers:
-
Kirby, M., and Sirovich, L., Application of the Karhunen-Loeve Procedure
for the Characterization of Human Faces, IEEE Trans. PAMI 12:103-108, Jan
1990. (optional)
-
Turk,
M., & Pentland,
A., Eigenfaces for recognition. Journal of Cognitive Neuroscience,
3(1), 71-86. 1991.
-
Poggio, T., & Sung,
K.K., Example-based
learning for view-based human face detection. Proceedings of the ARPA
IU Workshop '94, II:843-850. 1994.
-
Rowley,
H., Baluja,
S., and Kanade,
T., Neural Network-Based Face Detection, Proc. IEEE Conf. Computer
Vision and Pattern Recognition, CVPR-96, pp. 203-207,. IEEE Computer Society
Press. 1996. (optional)
-
Belhumeur,
P., Hespanha, J., and Kriegman, D., Eigenfaces
vs. Fisherfaces: Recognition using Class Specific Linear Projection,
Proc. European Conf. Computer Vision, ECCV-96, 1996.
Links:
-
CMU facedetector
page
-
Demo (if SGI Indy is available): "FaceIt"
ScreenLock
4/17 FACE RECOGNITION - DEFORMABLE MODELS
-
Yuille, A., Cohen, D.S.,
and Halliman, P., Feature Extraction from Faces using Deformable Templates,
Intl. Journal Computer Vision, Vol 8. pp 104-109, 1992.
-
Wiskott, L.,
Fellous, J., Kruger, N., von
der Malsburg, C., "Face
Recognition and Gender Determination", Proc. Intl. Workshop on Automatic
Face and Gesture Recognition, pp. 92-97, Zurich, 1995.
-
Lanitis, A., Taylor, C., Cootes,
T., A
Unified Approach to Coding and Interpreting Face Images, Proc. Fifth
Intl. Conf. on Computer Vision, ICCV'95, pp. 368-373, IEEE Computer Society
Press, 1995.
-
Moghaddam,
B., Nastar,
C., and
Pentland, A., Bayesian
Face Recognition using Deformable Intensity Surfaces, Proc. Conf. Computer
Vision and Pattern Recognition, CVPR'96, pp. 638-645, IEEE Computer Society
Press, 1996 (optional)
-
Beymer, D.,, Feature
Correspondence by Interleaving Shape and Texture Computations, Proc.
Conf. Computer Vision and Pattern Recognition, CVPR'96, pp. 921-928, IEEE
Computer Society Press, 1996 (optional)
Links:
-
MIT photobook/feret
page
4/24 EXPRESSIONS AND MOTION TRACKING
-
Terzopoulos, D. &
Waters,
K. (1990). Analysis of facial images using physical and anatomical
models. Proceedings of the International Conference on Computer Vision,
1990, 727-732. (optional)
-
Essa
, I., Darrell, T.,
& Pentland,
A., Tracking
Facial Motion, Proc. IEEE Nonrigid and Articulated Motion Workshop,
1995
-
Black, M., and
Yacoob, Y., Tracking
and Recognizing Rigid an Non-Rigid Facial Motions using Local Parametric
Models of Image Motion, Proc. Intl. Conf. Compter Vision ICCV'95, pp.
374-381, 1995
-
Hager,
G. & Belheumer,
P., Real-time
Tracking of Image Regions with Changes in Geometery and Illumination,
Proc. CVPR-96, pp. 403-410, 1996
Links:
-
Demo (if SGI Indy is availble): Yale Vision Group's XVision
real-time expression tracking)
5/1 HANDS AND GESTURES
Papers:
-
Ahmad S., A
Usable Real-Time 3D Hand Tracker, 28th Asilomar Conference on Signals,
Systems and Computers, IEEE Computer Society Press, 1995.
-
Freeman, W.,
T., and Weissman, C., Television
control by hand gestures, Proc. Intl. Workshop on Automatic Face and
Gesture Recognition, pp. 179--183, Zurich, 1995. (optionally
also see related paper on Orientation
Histograms for Gesutre Recognition in same proceedings.)
-
Wilson,
A., & Bobick,
A., A
State-based Technique for the Summarization and Recognition of Gesture,
Proc. Intl. Conf. Compter Vision, ICCV'95, pp. 374-381, IEEE Computer Society
Press. 1995 (optionally also see related WCV'95
paper)
-
Rehg, J.,
& Kanade, T., Model-based tracking of self-occluding articulated objects.
In Proc. of Intl. Conf. on Computer Vision, ICCV'95, pp. 612-617, IEEE
Computer Society Press. (or see longer technical
report), 1995.
5/8 BODY TRACKING AND ACTIVITY DETECTION
Papers:
-
Darrell, T., Maes,
P., Blumberg, B.,
and Pentland,
A., A
Novel Environment for Situated Vision and Behavior, Workshop on Visual
Behaviors, CVPR-94, IEEE Computer Society Press, 1994.
-
Kahn, R., Swain,
M., Prokopowicz,
P., and Firby,
R., Gesture
Recognition using the Perseus Architecture, (or html)
Proc. Conf. Computer Vision and Pattern Recognition, CVPR'96, pp. 734-741,
IEEE Computer Society Press, 1996
-
Polana R., and
Nelson,
R., Low
Level Recognition of Human Motion, IEEE Computer Society Workshop on
Motion of Nonrigid and Articulate Objects, Austin, TX, October 1994.
-
Zimmerman, T.,
Smith,
J, Paradiso, J.,
Allport, D., and Gershenfeld,
N.,
Applying
Electric Field Sensing to Human-Computer Interfaces (or html),
Proceedings CHI-95, ACM., 1995.
Links:
-
MIT Media Lab Perceptual Computing Group's KidsRoom
project
-
MIT Media Lab Physics and Media Group's Personal
Radar project
5/15 SPEECH AND ACOUSTIC METHODS SAMPLER:
Papers:
-
Rabiner, L., A
Tutorial on Hidden Markov Models and Selected Application in Speech Recognition,
Proceedings IEEE, 77(2), 257-286, 1989.
-
Van Immerseel, L. M., and Martens, J.-P., Pitch and voiced/unvoiced determination
with an auditory model, J. Acoust. Soc. Am, 91(6), June 1992.
-
Duda, R., Connectionist
Models for Auditory Scene Analysis, in Advances in Neural Info. Proc. Systems
6, pp. 1069-1076, Morgan Kauffman, 1994.
-
Wang, H., and Chu, P,. Voice Source Localization for Automatic Camera Pointing
System in Videoconferencing, Proc ICASSP-97, pp. 187-190, April 1997.
Links:
-
Duda, R, Sound
Localization Research
5/22 MULTIMODAL SPEECH RECOGNITION
-
Stork, D., Speech Reading:
An overview of image processing, feature extraction, sensory integration
and pattern recognition techniques, 2nd Intl. Conf. on Automatic Face and
Gesture Recognition, pp. xvi-xxvi,. Oct. 1996.
-
Pentland, A. P., and Mase, K., Lip reading: Automatic visual recognition
of spoken words, Proceedings Image Understanding and Machine Vision, vol.
14, pp. 124-127, and MIT Media Lab TR-117, Jan 1989.
-
P.Duchnowski, M .Hunke, D. Büsching, U. Meier and A.
Waibel, Toward
Movement-Invariant Automatic Lipreading and Speech Recognition, Proc.
Intern. Conference on Acoustics, Speech and Signal Procesing 1995 (for
underlying method see also earlier "see
me, hear me" paper)
-
Bregler, C., Omohundro,
S., Nonlinear Manifold Learning for Visual Speech Recognition, Proc.Fifth
Intl. Conf. Computer Vision,, June 1995.
AFFECTIVE / EMOTIONAL PROCESSING
Papers:
-
A.R. Demasio, Descartes' Error: Emotion, Reason and the Human Brain,New
York: Gosset/Putnam Press, 1994. (excerpts provided).
-
Picard, R., Affective
Computing, MIT Media Lab TR-321, November 1995. (abbreviated version
of forthcoming MIT Press book.)
Links:
-
Human
Affect Sensing
5/29 NO CLASS (PROJECTS DUE BY END OF QUARTER)
(Last Modified 5/7/97, Trevor Darrell)