We show how to formulate "direct" rigid motion tracking for the case where real-time depth is available. An depth analogue to the well-known brightness consistency constraint equation is derived, and shown to be more stable than the intensity-only constraint. Results tracking real and synthetic heads are presented.Real-time visual person tracking using multiple visual processing modalities
M. Harville, A. Rahimi, T. Darrell, G. Gordon, J. Woodfill, 3D Pose Tracking with Linear Depth and Brightness Constraints, Proceedings of ICCV99, Bombay, pp. 206-213, 1999. See Technical Report 1999-006.
M. Covell, A. Rahimi, M. Harville, T. Darrell, Articulated-Pose Estimation using Brightness and Depth-Constancy Constraints, Proceedings of CVPR00, Hilton Head, 2000. (pdf)
For details see a Powerpoint slide show describing this work, from a presentation given at Intel Corporation on Sept. 2, 1998.
T. Darrell, G. Gordon, M. Harville, J. Woodfill, Integrated person tracking using stereo, color, and pattern detection, Proceedings of the Conference on Computer Vistion and Pattern Recognition (CVPR '98), pp. 601-609, Santa Barbera, June 1998. (describes full system, including long-term tracking/indentification framework) Technical Report 1998-021 (html) (compressed postscript) (pdf)
This system served as the tracking substrate for the Magic
Morphing Mirror / Mass Hallucinations interactive installation at SIGGRAPH
and the Sillicon Valley Tech Museum of Innovation.
We explored how the use of real-time range information can enhance background modeling such as that used in the MIT Forest of Sensors and Pfinder projects. In certain cases depth can detect objects that are not not descriminable from intensity images alone; in addition, depth makes model acquisition easier.
For details see the CVPR99 paper:
G. Gordon, T. Darrell, M. Harville, J. Woodfill, Background Estimation and Removal Based on Range and Color, Proceedings of CVPR99, Ft. Collins CO, pp II:459-464, 1999. Technical Report 1998-071 (html) (compressed postscript) (pdf)
Robust image correspondence using a local image similarity transform
Image correspondence is one of the most fundamental problems in computer vision, and is a critical step in recovering depth or shape from motion or stereo cues. This paper describes a class of correspondence problems which are particularly difficult for existing methods, yet which occur in real images: a low contrast foreground occluder. We show how a simple new image transform can be combined with traditional matching metrics to solve this problem.
A Powerpoint slide show is available containing the slides presented at the CVPR98 paper:
T. Darrell, A radial cumulative similarity transform for robust image correspondence, Proceedings of the Conference on Computer Vistion and Pattern Recognition (CVPR 98), pp. 656-662, Santa Barbera, June 1998. Technical Report 1997-090 (html)(compressed postscript)(pdf)
Together with Michelle Covell, we applied this method to dynamic contour tracking. See the CVPR99 paper:
M. Covell and T. Darrell, Dynamic Occluding Contours: A New External-energy Term for Snakes,
Proceedings of CVPR99, Ft. Collins CO, pp II:232-238, 1999. (compressed postscript)
Rendering images of articulated figures from examples
Rendering realistic images of people or other articulated figures is a challanging problem in computer graphics. We are interested in the image-based rendering of articulated figures, driven entirely from example images or sequences. We have explored two classes of techniques, one which assumes no prior model of the articulated structure and uses general non-rigid interpolation methods, and another which requires a previously tracked kinematic chain representation.
In the first technique, we use function interpolation to render human arm images from examples of arm images at various endpoint locations. Since the mapping from endpoint to configuration is generally non-unique, traditional interpolation and function approximation methods fail and render physically impossible results. However, if we discover a decomposition of the example set into subsets which are locally well-behaved functions, we can safely render arm images using the appropriate subset. Finding such a decomposition can be considered a clustering problem; we show how an on-line cross-validation technique can discover regions of the mapping which are locally valid functions.
If we have a previous tracked kinematic chain description, then we can use image projection equations which exploit the rigid-body geometry of indivual links in the chain. However, a purely rigid interpolation mechanism will fail to model significant variation in the image, such as muscle deformation or folds due to clothing. We derive a hybrid technique which combines rigid and non-rigid modeling techniques, and show how it can be used to render images from very small example sets.
T. Darrell, Example Based Image Synthesis of Articulated Figures, Advances in Neural Information Processing Systems 11, (NIPS '98), MIT Press. Technical Report 1998-020 (html) (compressed postscript) (pdf)A PDF version of my NIPS*99 poster is also available.
A. Hertzmann and T. Darrell, Hybrid rigid and non-rigid image-based modeling of articulated figures. Technical Report 1998-045 (html) (compressed postscript)
This project was an extension of my thesis work which used hidden-state reinforcement learning togther with real-time person tracking to perform active recognition tasks. See Active Recognition using Hidden State Reinforcement Learning for details on the basic framework.
A Powerpoint presentation summarizes these results and my previous person tracking work at MIT. This talk was given at a Robotics colloqium at Berkeley in April 1997.
T. Darrell, Reinforcement learning of active recognition behaviors. Portions of this paper previously appeared in Advances in Neural Information Processing Systems 8, (NIPS '95), pp. 858-864, MIT Press, and Intelligent Robotic Systems, M. Vidyasagar ed., pp. 73-80, Tata Press, 1998. Technical Report 1997-045 (html) (compressed postscript)