Reinforcement Learning of Active Recognition Behaviors
Interval Research Corp.
1801 Page Mill Road Bl. C
Palo Alto, CA 94304
Click here for a compressed PostScript version of this report ]
Click here for HTML version of related Powerpoint slides ]
We show how a concise representation of active recognition
behavior- what observations to make to detect a given object- can be derived
from hidden-state reinforcement learning techniques. These learning techniques
can solve decision process tasks which include perceptual observations,
defined formally as Partially Observable Markov Decision Processes (POMDP).
We define recognition within a POMDP context, with an action indicating
recognition of the target as well as actions for adjusting the perceptual
apparatus or other effectors. An explicit supervised reward signal is provided
to the decision process whenever the accept action is performed. With sufficient
experience, a memory-based approach to reinforcement learning can find
optimal policies which discriminate target from distractor patterns despite
considerable perceptual aliasing at any given instant. To avoid perceptual
aliasing while learning, all similar experiences are combined when computing
the utility of a possible action, including experiences with both target
and distractor patterns. By discarding the representation of negative regions
of the utility space when learning is complete, and collapsing duplicate
representations of positive regions, a representation similar to an augmented
Finite State Machine is obtained. We show application of our method for
the task of recognizing human gesture performance that occurs at multiple
Interval Research Technical Report 1997-045. Portions of this paper previously
appeared in Advances in Neural Information Processing Systems 8,
(NIPS '95), pp. 858-864, MIT Press, and Intelligent Robotic Systems,
M. Vidyasagar ed., pp. 73-80, Tata Press, 1998.
©1998 Interval Research Corp.