Tyson Condie

EECS Department, University of California, Berkeley

Technical Report No. UCB/EECS-2011-71

June 4, 2011

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2011/EECS-2011-71.pdf

Building system software is a notoriously complex and arduous endeavor. Developing tools and methodologies for practical system software engineering has long been an active area of research. This thesis explores system software development through the lens of a declarative, data-centric programming language that can succinctly express high-level system specifications and be directly compiled to executable code. By unifying specification and implementation, our approach avoids the common problem of implementations diverging from specifications over time. In addition, we show that using a declarative language often results in drastic reductions in code size (100× and more) relative to procedural languages like Java and C++. We demonstrate these advantages by implementing a host of functionalities at various levels of the system hierarchy, including network protocols, query optimizers, and scheduling policies. In addition to providing a compact and optimized implementation, we demonstrate that our declarative implementations often map very naturally to traditional specifications: in many cases they are line-by-line translations of published pseudocode.

We started this work with the hypothesis that declarative languages — originally developed for the purposes of data management and querying — could be fruitfully adapted to the specification and implementation of core system infrastructure. A similar argument had been made for networking protocols a few years earlier [Loo thesis]. However, our goals were quite different: we wanted to explore a broader range of algorithms and functionalities (dynamic programming, scheduling, program rewriting, and system auditing) that were part of complex, real-world software systems. We identified two existing system components — query optimizers in a DBMS and task schedulers in a cloud computing system — that we felt would be better specified via a declarative language. Given our interest in delivering real-world software, a key challenge was identifying the right system boundary that would permit meaningful declarative implementations to coexist within existing imperative system architectures. We found that relations were a natural boundary for maintaining the ongoing system state on which the imperative and declarative code was based, and provided an elegant way to model system architectures.

This thesis explores the boundaries of declarative systems via two projects. We begin with Evita Raced; an extensible compiler for the Overlog language used in our declarative networking system, P2. Evita Raced is a metacompiler — an Overlog compiler written in Overlog — that integrates seamlessly with the P2 dataflow architecture. We first describe the minimalist design of Evita Raced, including its extensibility interfaces and its reuse of the P2 data model and runtime engine. We then demonstrate that a declarative language like Overlog is well-suited to expressing traditional and novel query optimizations as well as other program manipulations, in a compact and natural fashion. Following Evita Raced, we describe the initial work in BOOM Analytics, which began as a large-scale experiment at building “cloud” software in a declarative language. Specifically, we used the Overlog language to implement a “Big Data” analytics stack that is API-compatible with the Hadoop MapReduce architecture and provides comparable performance. We extended our declarative version of Hadoop with complex distributed features that remain absent in the stock Hadoop Java implementation, including alternative scheduling policies, online aggregation, continuous queries, and unique monitoring and debugging facilities. We present quantitative and anecdotal results from our experience, providing concrete evidence that both data-centric design and declarative languages can substantially simplify systems programming.

Advisors: Joseph M. Hellerstein


BibTeX citation:

@phdthesis{Condie:EECS-2011-71,
    Author= {Condie, Tyson},
    Title= {Declarative Systems},
    School= {EECS Department, University of California, Berkeley},
    Year= {2011},
    Month= {Jun},
    Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2011/EECS-2011-71.html},
    Number= {UCB/EECS-2011-71},
    Abstract= {Building system software is a notoriously complex and arduous endeavor. Developing tools and methodologies for practical system software engineering has long been an active area of research. This thesis explores system software development through the lens of a declarative, data-centric programming language that can succinctly express high-level system specifications and be directly compiled to executable code. By unifying specification and implementation, our approach avoids the common problem of implementations diverging from specifications over time. In addition, we show that using a declarative language often results in drastic reductions in code size (100× and more) relative to procedural languages like Java and C++. We demonstrate these advantages by implementing a host of functionalities at various levels of the system hierarchy, including network protocols, query optimizers, and scheduling policies. In addition to providing a compact and optimized implementation, we demonstrate that our declarative implementations often map very naturally to traditional specifications: in many cases they are line-by-line translations of published pseudocode. 

We started this work with the hypothesis that declarative languages — originally developed for the purposes of data management and querying — could be fruitfully adapted to the specification and implementation of core system infrastructure. A similar argument had been made for networking protocols a few years earlier [Loo thesis]. However, our goals were quite different: we wanted to explore a broader range of algorithms and functionalities (dynamic programming, scheduling, program rewriting, and system auditing) that were part of complex, real-world software systems. We identified two existing system components — query optimizers in a DBMS and task schedulers in a cloud computing system — that we felt would be better specified via a declarative language. Given our interest in delivering real-world software, a key challenge was identifying the right system boundary that would permit meaningful declarative implementations to coexist within existing imperative system architectures. We found that relations were a natural boundary for maintaining the ongoing system state on which the imperative and declarative code was based, and provided an elegant way to model system architectures.

This thesis explores the boundaries of declarative systems via two projects. We begin with Evita Raced; an extensible compiler for the Overlog language used in our declarative networking system, P2. Evita Raced is a metacompiler — an Overlog compiler written in Overlog — that integrates seamlessly with the P2 dataflow architecture. We first describe the minimalist design of Evita Raced, including its extensibility interfaces and its reuse of the P2 data model and runtime engine. We then demonstrate that a declarative language like Overlog is well-suited to expressing traditional and novel query optimizations as well as other program manipulations, in a compact and natural fashion. Following Evita Raced, we describe the initial work in BOOM Analytics, which began as a large-scale experiment at building “cloud” software in a declarative language. Specifically, we used the Overlog language to implement a “Big Data” analytics stack that is API-compatible with the Hadoop MapReduce architecture and provides comparable performance. We extended our declarative version of Hadoop with complex distributed features that remain absent in the stock Hadoop Java implementation, including alternative scheduling policies, online aggregation, continuous queries, and unique monitoring and debugging facilities. We present quantitative and anecdotal results from our experience, providing concrete evidence that both data-centric design and declarative languages can substantially simplify systems programming.},
}

EndNote citation:

%0 Thesis
%A Condie, Tyson 
%T Declarative Systems
%I EECS Department, University of California, Berkeley
%D 2011
%8 June 4
%@ UCB/EECS-2011-71
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2011/EECS-2011-71.html
%F Condie:EECS-2011-71