A natural approach to developing control architectures for complex tasks is to design multiple simple components, each addressing one aspect of the task domain, and provide a framework for combining the results. When an agent must learn these components by interacting with the environment, as in a Markov decision problem, the designer must take care not to sacrifice an optimal solution for a convenient representation. We propose an architecture that extends the well-known Q learning framework. It permits each component of a controller to compute a value function corresponding to a particular reward signal in the environment, with a supervisor executing the policy that optimizes the average value over all components. This allows each component to learn an estimate of its true value function in advance and improve it while the supervisor executes the policy implied by its agents, and in certain circumstances leads to an optimal policy. Our research applies this approach to complex motor learning problems, which require agents to balance a variety of constraints. We also plan to adapt this method to current techniques that allow researchers to specify partial programs solving Markov decision problems.
There is growing interest in building large knowledge bases (KBs) of everyday knowledge about the world, comprising tens or hundreds of thousands of assertions. But with large KBs, general-purpose reasoners used to answer queries tend to suffer from combinatorial explosion. One promising approach to grappling with the complexity of such KBs is to structure the content into multiple domain- or task-specific partitions. But how are the partitions created, and how do we exploit them in reasoning?
Our research concerns detecting and exploiting the implicit structure of knowledge to make reasoning more focused and efficient. In particular, we investigate how two closely related questions: (1) How can we automatically decompose a large KB into a network of tightly-focused, minimally-connected partitions? (2) Given a network of (possibly heterogeneous) knowledge systems, how can we achieve efficient global reasoning?
In our approach, a first-order or propositional theory is partitioned into tightly coupled subtheories according to the language of the axioms in the theory . This partitioning induces a graphical representation where a node represents a particular partition or subtheory and an arc represents the shared language between subtheories .
We developed a family of message-passing (MP) algorithms [1,3] for reasoning with partitions of axioms in propositional logic (PL) and first-order logic (FOL). We analyzed the computational benefit of our algorithms and detected those parameters of a partitioning that influence the efficiency of computation. These inference algorithms are sound and complete, i.e., provide all the correct answers and only those answers. The efficiency of our algorithms has been shown empirically to be superiour to widely used inference strategies.
In this work we investigate a general-purpose planning approach that finds plans in structured domains. Our algorithm factors a planning domain into subdomains and finds plans using this factorization. The runtime for our algorithm scales linearly with the size of the domain, and exponentially with the size of the largest subdomain and the amount of interaction between subdomains. We achieve this by limiting the amount of back-and-forth interactions that a plan allows between subdomains, and by choosing a factoring that minimizes this interaction. We do so using earlier work on graph partitioning and treewidth, but the user may also provide a factorization to the algorithm. We prove that our planning algorithm is sound and complete.
Many real-world planning domains have the decomposable structure that our approach exploits. We implemented and tested our algorithm in such domains and discovered that it finds plans in a fraction of the time taken by state-of-the-art planners, and scales to very large problems.
First-order probabilistic languages (FOPLs) combine the expressive power of first-order logic with the uncertainty handling of probability theory. Our group aims to apply these languages to interesting real-world domains. However, as we consider FOPLs of increasing complexity, inference grows difficult; and so, when we define useful languages, we must simultaneously work on developing tractable inference algorithms.
In recent years, several families of FOPLs have been proposed, but no clear consensus has emerged about what the "right" language is, either in general, or in specific application domains. We are investigating these proposed languages with respect to expressive power, efficiency of inference, and ease of modeling.
Much of our current research is inspired by Avi Pfeffer's probabilistic relational models (PRMs), a FOPL family based on semantic networks. We have extended PRMs in several ways, most notably by removing the unique-names assumption, and thus introducing uncertainty over the number of objects present in the system. This has permitted us to apply our work to problems such as data association (vehicle tracking), and, more recently, citation clustering. We plan to continue working on information extraction from the web, and also on problems from computational biology, such as modeling motifs in DNA sequences.
Exact inference in these domains is, in general, not tractable, and so we have developed an approximate approach based on the Markov Chain Monte Carlo algorithm, augmented by intelligent proposal distributions. We are also developing deterministic methods, in which the relational structure of the domain is used to motivate a structured variational approximation to the true posterior.
We provide algorithms for determining the belief state of an agent, i.e., its knowledge about the state of the world. We describe our domains using a logical action language that allows nondeterministic actions and partial observability. Our algorithms update an initial belief state with every execution of an action and when collecting observations. This iterative updating process is called logical filtering. Our algorithms are computationally superior to current methods that are used in nondeterministic planning. Several classes of dynamic systems that we identify allow particularly efficient filtering. In some cases our algorithms compute the filtering of action/observation sequences of arbitrary length efficiently while maintaining compact representation of belief states over time. Other cases allow efficient approximation of the belief state under reasonable conditions. Some of the properties of domains that we identify can be used to launch further investigation into efficient projection, execution monitoring, planning, and diagnosis.
We are exploring efficient techniques for reasoning under uncertainty in first-order domains. Given a knowledge base that contains sentences of first-order logic, each of which may be adorned with a degree of belief, i.e., a probability of truth, we would like to compute the degree of belief of a new sentence. Our technique involves transforming such a knowledge base into a probability distribution over possible worlds, which defines the degree of belief of every sentence of the language. Our current work involves constructing maximum entropy (or log-linear) distributions. We find that this approach has appealing inferential properties and a compact representation. In particular, maximum entropy distributions can be represented by graphical models whose structure corresponds to the structure of the original knowledge base, and conditional independences in such models can be leveraged to obtain efficient inference algorithms. Importantly, we have found that the complexity of probabilistic reasoning under these maximum entropy distributions is no harder than the corresponding deterministic inference problems.
Simultaneous localization and mapping is a fundamental problem in mobile robotics: while a robot navigates in an unknown environment, it must incrementally build a map of its surroundings and localize itself within that map. Traditional approaches to the problem are based upon Kalman filters, but suffer from complexity issues: first, the belief state grows quadratically in the size of the map; and second, the filtering operation can take time quadratic in the size of the map. I have developed a linear-space filter that maintains a tractable approximation of the filtered belief state as a thin junction tree. The junction tree grows under measurement and motion updates and is periodically "thinned" to remain tractable. The time complexity of the filter operation is linear in the size of the map, and further approximations permit constant-time approximate filtering.