Monitoring Hadoop through Tracing
Andrew Konwinski, Matei Zaharia, Randy H. Katz, Ion Stoica and Scott Shenker
Many of today's most important computer applications, such as web search, online shopping, and email, run on large data centers composed of thousands of failure-prone commodity machines. Understanding and managing these applications is a challenging task. We show that tracing can be used to profile and debug massively distributed applications by instrumenting Hadoop , an open source distributed file system and parallel computation framework. For this purpose, we use Berkeley's X-Trace framework . Implementing tracing in Hadoop lets us easily evaluate modifications to Hadoop, such as changes to the distributed file system and load balancing algorithms. We also intend to apply statistical machine learning techniques to traces of Hadoop to automatically detect performance problems and failures, further reducing the management cost of data centers and providing early warning of impending problems.