Autonomic Reactive Systems via Online Learning

Sanjit A. Seshia.
IEEE International Conference on Autonomic Computing, June 2007, (to appear).

Reactive systems are those that maintain an ongoing interaction with their environment at a speed dictated by the latter. Examples of such systems include web servers, network routers, sensor nodes, and autonomous robots. While we increasingly rely on the correct operation of reactive systems, it is becoming ever harder to deploy bug-free systems.

In this paper, we propose a formal framework for automatically recovering a class of reactive systems from run-time failures. This class of systems comprises those whose executions can be divided into rounds such that each round performs a new unit of work. We show how the system recovery problem can be modeled as an instance of an online learning problem. On the theoretical side, we give a strategy that is near-optimal, and state and prove bounds on its performance. On the practical side, we demonstrate the effectiveness of our approach through the case study of a buggy network monitor. Our results indicate that online learning provides a useful basis for constructing autonomic reactive systems.

Paper available in PDF format.