Stephen Dawson-Haggerty and Ariel Rabkin
Any large distributed system will have one, and often several, systems for managing and aggregating monitoring data, particularly telemetry, trace data, and alarms. We believe that many of these monitoring and tracing applications can be structured as a set of modules that generate and consume streams of messages. By refactoring the common features of these applications into a shared framework, we will allow users of the system to realize the normal benefits of shared code: shared enhancements and reduced development effort.
To aid code reuse, modules have a common API and message format. As a result, modules can be chained together to form complex processing pipelines. These pipelines are assembled based on a declarative specification, thus allowing a simple high-level description of the application. Meanwhile, the framework silently solves the "hard" problems of reliability and deployment inside the framework, decoupling application-specific computation from the mechanism of deploying it.