Electrical Engineering
      and Computer Sciences

Electrical Engineering and Computer Sciences


UC Berkeley


2008 Research Summary

Monitoring Hadoop through Tracing

View Current Project Information

Andrew Konwinski, Matei Zaharia, Randy H. Katz, Ion Stoica and Scott Shenker

Many of today's most important computer applications, such as web search, online shopping, and email, run on large data centers composed of thousands of failure-prone commodity machines. Understanding and managing these applications is a challenging task. We show that tracing can be used to profile and debug massively distributed applications by instrumenting Hadoop [1], an open source distributed file system and parallel computation framework. For this purpose, we use Berkeley's X-Trace framework [2]. Implementing tracing in Hadoop lets us easily evaluate modifications to Hadoop, such as changes to the distributed file system and load balancing algorithms. We also intend to apply statistical machine learning techniques to traces of Hadoop to automatically detect performance problems and failures, further reducing the management cost of data centers and providing early warning of impending problems.