Electrical Engineering
      and Computer Sciences

Electrical Engineering and Computer Sciences

COLLEGE OF ENGINEERING

UC Berkeley

Statistical Workloads for Energy Efficient MapReduce

Yanpei Chen, Archana Sulochana Ganapathi, Armando Fox, Randy H. Katz and David A. Patterson

EECS Department
University of California, Berkeley
Technical Report No. UCB/EECS-2010-6
January 21, 2010

http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-6.pdf

Energy efficiency is a growing concern in modern datacenters. As Internet services increasingly rely on MapReduce workloads to fuel their flagship businesses, there is a growing need for better MapReduce energy efficency evaluation mechanisms. We present a statistics-driven workload generation framework that distills summary statistics from production MapReduce traces and realistically reproduces representative workloads. These workloads help us evaluate design decisions with regard to scale, configuration, scheduling, and other issues. We use this framework to identify specific suggestions to improve MapReduce energy efficiency. Our key finding is that evaluations using trace-driven workloads reverse current design priorities in optimizing for data intensive synthetic jobs.


BibTeX citation:

@techreport{Chen:EECS-2010-6,
    Author = {Chen, Yanpei and Ganapathi, Archana Sulochana and Fox, Armando and Katz, Randy H. and Patterson, David A.},
    Title = {Statistical Workloads for Energy Efficient MapReduce},
    Institution = {EECS Department, University of California, Berkeley},
    Year = {2010},
    Month = {Jan},
    URL = {http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-6.html},
    Number = {UCB/EECS-2010-6},
    Abstract = {Energy efficiency is a growing concern in modern datacenters. As Internet services increasingly rely on MapReduce workloads to fuel their flagship businesses, there is a growing need for better MapReduce energy efficency evaluation mechanisms. We present a statistics-driven workload generation framework that distills summary statistics from production MapReduce traces and realistically reproduces representative workloads. These workloads help us evaluate design decisions with regard to scale, configuration, scheduling, and other issues. We use this framework to identify specific suggestions to improve MapReduce energy efficiency. Our key finding is that evaluations using trace-driven workloads reverse current design priorities in optimizing for data intensive synthetic jobs.}
}

EndNote citation:

%0 Report
%A Chen, Yanpei
%A Ganapathi, Archana Sulochana
%A Fox, Armando
%A Katz, Randy H.
%A Patterson, David A.
%T Statistical Workloads for Energy Efficient MapReduce
%I EECS Department, University of California, Berkeley
%D 2010
%8 January 21
%@ UCB/EECS-2010-6
%U http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-6.html
%F Chen:EECS-2010-6