Angelic Hierarchical Planning
Jason Wolfe, Stuart J. Russell and Bhaskara Marthi1
Defense Advanced Research Projects Agency FA8750-05-2-0249 and Defense Advanced Research Projects Agency FA8750-07-D-0185 (subcontract 03-000219)
To make effective decisions in realistic environments, an agent must cope with huge search spaces and very long decision horizons. Humans seem to do this quite well, intelligently choosing the trillions of primitive motor commands that constitute a life. We manage this in large part by imposing hierarchical structure on our behavior, ranging from (billion-step?) high-level actions such as writing a conference paper down to (hundred-step?) motor programs for typing characters and saying phonemes. The principle benefit of using high-level actions seems obvious: high-level plans are much shorter, and should be much easier to find. In order to capture this benefit, however, an agent must be able to assess the quality of high-level plans. Doing so requires models for what high-level actions do, an issue that has not been satisfactorily addressed in the literature.
The goal of this project is to rectify this omission and thus develop general-purpose algorithms for automating effective high-level reasoning. Last year  we proposed an "angelic semantics" for high-level actions (HLAs), which correctly models the effects of a high-level plan by its sets of possible outcomes. This year , we extended our approach to include action costs. This extended semantics allows us to generate bounds on the costs of abstract plans, and thus compare, commit to, or discard them without first refining them down to the level of primitive motor commands, enabling the development of novel algorithms that do abstract lookahead with HLAs. First, in the offline planning setting, we developed Angelic Hierarchical A*, one of the first algorithms able to generate and commit to provably optimal abstract plans. In the online setting, we developed Angelic Hierarchical Learning Real-Time A*, which combines hierarchical lookahead with learning to efficiently choose good actions given very limited computation time.
Future work will consider extensions of our approach to concurrent, probabilistic, and partially observable settings, as well as applications to real(istic) problems such as robot control or real-time strategy games.
Figure 1: Left: a 4x4 "warehouse world" problem instance where the goal is to have On(c,t2) and On(a,c). This problem cannot be solved using fewer than 50 primitive actions. Right: Informal description of our HLAs for this domain. An optimal high-level solution for the problem at left requires only six applications of the Move HLA.
Figure 2: Total cost to the goal (in log scale) for two real-time search algorithms as a function of the allowed computation per environment step, averaged over three instances of the warehouse world domain. Algorithms are ordinary (non-hierarchical) Learning Real-Time A* (LRTA*) and our Angelic Hierarchical generalization, AHLRTA*. Whereas LRTA* does lookahead at the level of primitive actions, AHLRTA* does abstract lookahead using both primitive actions and HLAs. As can be seen, this enables it to choose very good actions with only very limited computation time, whereas even given 10 times as much computation time, LRTA* is still performing an order of magnitude away from optimal.
- Bhaskara Marthi, Stuart Russell, and Jason Wolfe. "Angelic Semantics for High-Level Actions." In Proc. ICAPS, 2007.
- Bhaskara Marthi, Stuart Russell, and Jason Wolfe. "Angelic Hierarchical Planning: Optimal and Online Algorithms." In Proc. ICAPS, 2008.
1MIT/Willow Garage Inc.