Electrical Engineering
      and Computer Sciences

Electrical Engineering and Computer Sciences

COLLEGE OF ENGINEERING

UC Berkeley

Implementation of Real-Time Distributed Discrete-Event Execution with Fault Tolerance

Thomas Huining Feng and Edward A. Lee

EECS Department
University of California, Berkeley
Technical Report No. UCB/EECS-2007-133
November 8, 2007

http://www.eecs.berkeley.edu/Pubs/TechRpts/2007/EECS-2007-133.pdf

We build on PTIDES, a programming model for distributed embedded systems that uses discrete-event (DE) models as program specifications. PTIDES improves on distributed DE execution by allowing more concurrent event processing without backtracking. This paper discusses the general execution strategy for PTIDES, and provides two feasible implementations. This execution strategy is then extended with tolerance for hardware errors. We take a program transformation approach to automatically enhance DE models with incremental checkpointing and state recovery functionality. Our fault tolerance mechanism is lightweight and has low overhead. It requires very little human intervention. We incorporate this mechanism into PTIDES for efficient execution of fault-tolerant real-time distributed DE systems.

Author Comments: Accepted for publication in RTAS 2008, April 2008.


BibTeX citation:

@techreport{Feng:EECS-2007-133,
    Author = {Feng, Thomas Huining and Lee, Edward A.},
    Title = {Implementation of Real-Time Distributed Discrete-Event Execution with Fault Tolerance},
    Institution = {EECS Department, University of California, Berkeley},
    Year = {2007},
    Month = {Nov},
    URL = {http://www.eecs.berkeley.edu/Pubs/TechRpts/2007/EECS-2007-133.html},
    Number = {UCB/EECS-2007-133},
    Note = {Accepted for publication in RTAS 2008, April 2008.},
    Abstract = {We build on PTIDES, a programming model for distributed embedded systems that uses discrete-event (DE) models as program specifications. PTIDES improves on distributed DE execution by allowing more concurrent event processing without backtracking.

This paper discusses the general execution strategy for PTIDES, and provides two feasible implementations. This execution strategy is then extended with tolerance for hardware errors. We take a program transformation approach to automatically enhance DE models with incremental checkpointing and state recovery functionality. Our fault tolerance mechanism is lightweight and has low overhead. It requires very little human intervention. We incorporate this mechanism into PTIDES for efficient execution of fault-tolerant real-time distributed DE systems.}
}

EndNote citation:

%0 Report
%A Feng, Thomas Huining
%A Lee, Edward A.
%T Implementation of Real-Time Distributed Discrete-Event Execution with Fault Tolerance
%I EECS Department, University of California, Berkeley
%D 2007
%8 November 8
%@ UCB/EECS-2007-133
%U http://www.eecs.berkeley.edu/Pubs/TechRpts/2007/EECS-2007-133.html
%F Feng:EECS-2007-133