Electrical Engineering
      and Computer Sciences

Electrical Engineering and Computer Sciences

COLLEGE OF ENGINEERING

UC Berkeley

Scalable Scheduling for Sub-Second Parallel Jobs

Patrick Wendell

EECS Department
University of California, Berkeley
Technical Report No. UCB/EECS-2013-79
May 16, 2013

http://www.eecs.berkeley.edu/Pubs/TechRpts/2013/EECS-2013-79.pdf

Large-scale data analytics frameworks are shifting towards shorter task durations and larger degrees of parallelism to provide low latency. However, scheduling highly parallel jobs that com- plete in hundreds of milliseconds poses a major challenge for cluster schedulers, which will need to place millions of tasks per second on appropriate nodes while offering millisecond-level la- tency and high availability. We demonstrate that a decentralized, randomized sampling approach provides near-optimal performance while avoiding the throughput and availability limitations of a centralized design. We implement and deploy our scheduler, Sparrow, on a real cluster and demon- strate that Sparrow performs within 14% of an ideal scheduler.

Advisor: Ion Stoica


BibTeX citation:

@mastersthesis{Wendell:EECS-2013-79,
    Author = {Wendell, Patrick},
    Title = {Scalable Scheduling for Sub-Second Parallel Jobs},
    School = {EECS Department, University of California, Berkeley},
    Year = {2013},
    Month = {May},
    URL = {http://www.eecs.berkeley.edu/Pubs/TechRpts/2013/EECS-2013-79.html},
    Number = {UCB/EECS-2013-79},
    Abstract = {Large-scale data analytics frameworks are shifting towards shorter task durations and larger
degrees of parallelism to provide low latency. However, scheduling highly parallel jobs that com-
plete in hundreds of milliseconds poses a major challenge for cluster schedulers, which will need
to place millions of tasks per second on appropriate nodes while offering millisecond-level la-
tency and high availability. We demonstrate that a decentralized, randomized sampling approach
provides near-optimal performance while avoiding the throughput and availability limitations of a
centralized design. We implement and deploy our scheduler, Sparrow, on a real cluster and demon-
strate that Sparrow performs within 14% of an ideal scheduler.}
}

EndNote citation:

%0 Thesis
%A Wendell, Patrick
%T Scalable Scheduling for Sub-Second Parallel Jobs
%I EECS Department, University of California, Berkeley
%D 2013
%8 May 16
%@ UCB/EECS-2013-79
%U http://www.eecs.berkeley.edu/Pubs/TechRpts/2013/EECS-2013-79.html
%F Wendell:EECS-2013-79