Electrical Engineering
      and Computer Sciences

Electrical Engineering and Computer Sciences

COLLEGE OF ENGINEERING

UC Berkeley

Minimizing Communication in Numerical Linear Algebra

Grey Ballard, James Demmel, Olga Holtz and Oded Schwartz

EECS Department
University of California, Berkeley
Technical Report No. UCB/EECS-2011-15
February 28, 2011

http://www.eecs.berkeley.edu/Pubs/TechRpts/2011/EECS-2011-15.pdf

In 1981 Hong and Kung proved a lower bound on the amount of communication (amount of data moved between a small, fast memory and large, slow memory) needed to perform dense, n-by-n matrix-multiplication using the conventional O(n^3) algorithm, where the input matrices were too large to fit in the small, fast memory. In 2004 Irony, Toledo and Tiskin gave a new proof of this result and extended it to the parallel case (where communication means the amount of data moved between processors). In both cases the lower bound may be expressed as Omega(#arithmetic operations / M^(1/2)), where M is the size of the fast memory (or local memory in the parallel case). Here we generalize these results to a much wider variety of algorithms, including LU factorization, Cholesky factorization, LDL^T factorization, QR factorization, Gram--Schmidt algorithm, algorithms for eigenvalues and singular values, i.e., essentially all direct methods of linear algebra. The proof works for dense or sparse matrices, and for sequential or parallel algorithms. In addition to lower bounds on the amount of data moved (bandwidth-cost), we get lower bounds on the number of messages required to move it (latency-cost). We extend our lower bound technique to compositions of linear algebra operations (like computing powers of a matrix), to decide whether it is enough to call a sequence of simpler optimal algorithms (like matrix multiplication) to minimize communication, or if we can do better. We give examples of both. We also show how to extend our lower bounds to certain graph theoretic problems. We point out recently designed algorithms that attain many of these lower bounds.


BibTeX citation:

@techreport{Ballard:EECS-2011-15,
    Author = {Ballard, Grey and Demmel, James and Holtz, Olga and Schwartz, Oded},
    Title = {Minimizing Communication in Numerical Linear Algebra},
    Institution = {EECS Department, University of California, Berkeley},
    Year = {2011},
    Month = {Feb},
    URL = {http://www.eecs.berkeley.edu/Pubs/TechRpts/2011/EECS-2011-15.html},
    Number = {UCB/EECS-2011-15},
    Abstract = {In 1981 Hong and Kung proved a lower bound on the amount of communication (amount of data moved between a small, fast memory and large, slow memory) needed to perform dense, n-by-n matrix-multiplication using the conventional O(n^3) algorithm, where the input matrices were too large to fit in the small, fast memory.  In 2004 Irony, Toledo and Tiskin
gave a new proof of this result and extended it to the parallel case (where communication means the amount of data moved between processors). In both cases the lower bound may be expressed as Omega(#arithmetic operations / M^(1/2)), where M is the size of the fast memory (or local memory in the parallel case).

Here we generalize these results to a much wider variety of
algorithms, including LU factorization, Cholesky factorization, LDL^T factorization, QR factorization, Gram--Schmidt algorithm, algorithms for eigenvalues and singular values, i.e., essentially all direct methods of linear algebra.

The proof works for dense or sparse matrices, and for sequential or parallel algorithms. In addition to lower bounds on the amount of data moved (bandwidth-cost),
we get lower bounds on the number of messages required to move it (latency-cost).

We extend our lower bound technique to compositions of linear algebra operations (like computing powers of a matrix), to decide whether it is enough to call a sequence of simpler optimal algorithms (like matrix multiplication)
to minimize communication, or if we can do better. We give examples of both. We also show how to extend our lower bounds to certain graph theoretic problems.

We point out recently designed algorithms that attain many of these lower bounds.}
}

EndNote citation:

%0 Report
%A Ballard, Grey
%A Demmel, James
%A Holtz, Olga
%A Schwartz, Oded
%T Minimizing Communication in Numerical Linear Algebra
%I EECS Department, University of California, Berkeley
%D 2011
%8 February 28
%@ UCB/EECS-2011-15
%U http://www.eecs.berkeley.edu/Pubs/TechRpts/2011/EECS-2011-15.html
%F Ballard:EECS-2011-15