Electrical Engineering
      and Computer Sciences

Electrical Engineering and Computer Sciences

COLLEGE OF ENGINEERING

UC Berkeley

Communication-Efficient Distributed Stochastic Gradient Descent with Butterfly Mixing

Huasha Zhao and John F. Canny

EECS Department
University of California, Berkeley
Technical Report No. UCB/EECS-2012-96
May 11, 2012

http://www.eecs.berkeley.edu/Pubs/TechRpts/2012/EECS-2012-96.pdf

Stochastic gradient descent is a widely used method to find locally-optimal models in machine learning and data mining. However, it is naturally a sequential algorithm, and parallelization involves severe compromises because the cost of synchronizing across a cluster is much larger than the time required to compute an optimal-sized gradient step. Here we explore butterfly mixing, where gradient steps are interleaved with the k stages of a butterfly network on 2^k nodes. Udp based butterfly mix steps should be extremely fast and failure-tolerant, and convergence is almost as fast as a full mix (AllReduce) on every step.

Advisor: John F. Canny


BibTeX citation:

@mastersthesis{Zhao:EECS-2012-96,
    Author = {Zhao, Huasha and Canny, John F.},
    Title = {Communication-Efficient Distributed Stochastic Gradient Descent with Butterfly Mixing},
    School = {EECS Department, University of California, Berkeley},
    Year = {2012},
    Month = {May},
    URL = {http://www.eecs.berkeley.edu/Pubs/TechRpts/2012/EECS-2012-96.html},
    Number = {UCB/EECS-2012-96},
    Abstract = {Stochastic gradient descent is a widely used method to find locally-optimal models in machine learning and data mining. However, it is naturally a sequential algorithm, and parallelization involves severe compromises because the cost of synchronizing across a cluster is much larger than the time required to compute an optimal-sized gradient step. Here we explore butterfly mixing, where gradient steps are interleaved with the k stages of a butterfly network on 2^k nodes. Udp based butterfly mix steps should be extremely fast and failure-tolerant, and convergence is almost as fast as a full mix (AllReduce) on every step.}
}

EndNote citation:

%0 Thesis
%A Zhao, Huasha
%A Canny, John F.
%T Communication-Efficient Distributed Stochastic Gradient Descent with Butterfly Mixing
%I EECS Department, University of California, Berkeley
%D 2012
%8 May 11
%@ UCB/EECS-2012-96
%U http://www.eecs.berkeley.edu/Pubs/TechRpts/2012/EECS-2012-96.html
%F Zhao:EECS-2012-96