Mining Console Logs for Large-Scale Systems Runtime Problem Detection
Wei Xu, Armando Fox and David A. Patterson
RAD Lab Industrial Affiliates, UC Discovery, UC MICRO
Large-scale Internet services today run in large server clusters. A recent trend is to run these services on a virtualized cloud computing environment such as Amazon’s Elastic Computing Cloud (EC2). The scale and complexity of these services makes it very difficult to design, deploy, and maintain a monitoring system. In this project, we propose to return to the console log, the natural tracing information included in almost every software system, for monitoring and problem detection.
Console logs are free text logs generated by an explicit log printing statement in programs. They have been used since the early days of computing in almost every software system. They are very flexible and have great expressive power. Developers use console logs to report internal states, trace system execution paths, and provide run-time statistics. Considering the popularity of console logs, they will surely continue to be an essential method for system monitoring and diagnostics. Unfortunately, operators usually ignore console logs, mainly because console logs are very hard to understand, due to their highly unstructured nature. Existing solutions to console log analysis usually require the operator to specify keyword queries, which is usually hard to come up with.
We propose a general approach for mining console logs for detecting runtime problems in large-scale systems without the need of query. We combined a novel log structure analysis method, which made use of program source code, with machine learning algorithms to automatically select the most interesting part in console logs and as well as visualizing the result. We have demonstrated the effectiveness of our method in open source systems such as Hadoop and Sun’s Project Darkstar game server.
Figure 1: System overview