2009 Research Summary

RIOT Backplane/Chukwa/Monitoring Tools

Ariel Rabkin and Andrew Konwinski

Any large distributed system will have one, and often several, systems for managing and aggregating monitoring data, particularly telemetry, trace data, and alarms. A system such as the RAD Lab's proposed director has unusually demanding data collection needs. The director will require both large batch tasks for detecting trends and anomalies, and low-latency processing for real-time control.

We're working on a system, called Chukwa, to deliver on both goals. Chukwa decouples the application-specific logic for collection and for analysis, from the problem of scalable processing and data retention. Chukwa does this by leveraging Hadoop's scalable MapReduce processing for batch analysis and long-term data storage. Indexing, as well as low-latency analysis, are major research priorities for us. Chukwa is being developed in cooperation with Yahoo!.