Electrical Engineering
      and Computer Sciences

Electrical Engineering and Computer Sciences

COLLEGE OF ENGINEERING

UC Berkeley

An Empirical Study of the Control and Data Planes (or Control Plane Determinism is Key for Replay Debugging Datacenter Applications)

Gautam Altekar and Ion Stoica

EECS Department
University of California, Berkeley
Technical Report No. UCB/EECS-2010-85
May 21, 2010

http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-85.pdf

Replay debugging systems enable the reproduction and debugging of non-deterministic failures in production application runs. However, no existing replay system is suitable for datacenter applications like Cassandra, Hadoop, and Hypertable. For these large scale, distributed, and data intensive programs, existing methods either incur excessive production overheads or don’t scale to multi-node, terabyte-scale processing. In this position paper, we hypothesize and empirically verify that control plane determinism is the key to record-efficient and high-fidelity replay of datacenter applications. The key idea behind control plane determinism is that debugging does not always require a precise replica of the original datacenter run. Instead, it often suffices to produce some run that exhibits the original behavior of the control plane–-the application code responsible for controlling and managing data flow through a datacenter system.


BibTeX citation:

@techreport{Altekar:EECS-2010-85,
    Author = {Altekar, Gautam and Stoica, Ion},
    Title = {An Empirical Study of the Control and Data Planes (or Control Plane Determinism is Key for Replay Debugging Datacenter Applications)},
    Institution = {EECS Department, University of California, Berkeley},
    Year = {2010},
    Month = {May},
    URL = {http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-85.html},
    Number = {UCB/EECS-2010-85},
    Abstract = {Replay debugging systems enable the reproduction and
debugging of non-deterministic failures in production
application runs. However, no existing replay system
is suitable for datacenter applications like Cassandra,
Hadoop, and Hypertable. For these large scale, distributed,
and data intensive programs, existing methods
either incur excessive production overheads or don’t
scale to multi-node, terabyte-scale processing.

In this position paper, we hypothesize and empirically
verify that control plane determinism is the key to record-efficient
and high-fidelity replay of datacenter applications.
The key idea behind control plane determinism is
that debugging does not always require a precise replica
of the original datacenter run. Instead, it often suffices
to produce some run that exhibits the original behavior
of the control plane–-the application code responsible for
controlling and managing data flow through a datacenter
system.}
}

EndNote citation:

%0 Report
%A Altekar, Gautam
%A Stoica, Ion
%T An Empirical Study of the Control and Data Planes (or Control Plane Determinism is Key for Replay Debugging Datacenter Applications)
%I EECS Department, University of California, Berkeley
%D 2010
%8 May 21
%@ UCB/EECS-2010-85
%U http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-85.html
%F Altekar:EECS-2010-85