Reinforcement Learning of Active Recognition Behaviors

T. Darrell
Interval Research Corp.
1801 Page Mill Road Bl. C
Palo Alto, CA 94304
 [ Click here for a compressed PostScript version of this report ]
[ Click here for HTML version of related Powerpoint slides ]



We show how a concise representation of active recognition behavior- what observations to make to detect a given object- can be derived from hidden-state reinforcement learning techniques. These learning techniques can solve decision process tasks which include perceptual observations, defined formally as Partially Observable Markov Decision Processes (POMDP). We define recognition within a POMDP context, with an action indicating recognition of the target as well as actions for adjusting the perceptual apparatus or other effectors. An explicit supervised reward signal is provided to the decision process whenever the accept action is performed. With sufficient experience, a memory-based approach to reinforcement learning can find optimal policies which discriminate target from distractor patterns despite considerable perceptual aliasing at any given instant. To avoid perceptual aliasing while learning, all similar experiences are combined when computing the utility of a possible action, including experiences with both target and distractor patterns. By discarding the representation of negative regions of the utility space when learning is complete, and collapsing duplicate representations of positive regions, a representation similar to an augmented Finite State Machine is obtained. We show application of our method for the task of recognizing human gesture performance that occurs at multiple spatial scales. 
Interval Research Technical Report 1997-045. Portions of this paper previously appeared in Advances in Neural Information Processing Systems 8, (NIPS '95), pp. 858-864, MIT Press, and Intelligent Robotic Systems,  M. Vidyasagar ed., pp. 73-80, Tata Press, 1998.

©1998 Interval Research Corp.

Trevor Darrell