Research
Thesis research
My thesis project involves developing new algorithms in sparse and dense linear algebra,
with the goal of avoiding communication. Communication includes moving data
between processors ("parallel"), and moving data between levels of the memory hierarchy
("sequential"). For several decades, communication performance has gotten exponentially
worse relative to arithmetic performance. Current hardware trends, such as multicore
and off-chip compute accelerators, only exacerbate existing limitations in memory and
bus bandwidth. In fact, we cannot expect new developments in hardware to solve the problem
completely, because
Flops are cheap, bandwidth is money, latency is physics.
Floating-point (and integer) computational performance scales with the number
of transistors on a chip. Bandwidth scales with cost of hardware (number of
wires, pipes, banks, disks, ...) and energy consumption. Latency is the
hardest to improve, because it's often limited by physical laws (even the speed
of light is computationally significant).
These hard facts call for new algorithms that work with the physical and
economic laws of hardware. Rearranging algorithms to avoid communication can
often improve performance, even if it requires redundant computation. I've
applied this observation to iterative methods for sparse linear systems and
eigenvalue problems, by rearranging the algorithms so that several steps of the
method can be performed at the same time, for the same latency cost as a single
step of the original algorithm. Performance tuning involves a subtle balance
between latency, bandwidth, redundant computation, and numerical accuracy.
I've also worked out new preconditioning techniques that may make our algorithms
practically useful for a wide class of problems. On the dense side, I've helped
develop and implement a new QR factorization that communicates asymptotically
less than existing algorithms in LAPACK and ScaLAPACK, and have helped prove
lower bounds on the amount of communication a QR factorization must perform.
Publications and talks
Publications
- "Communication-avoiding Krylov
subspace methods." Mark Hoemmen. PhD thesis, University
of California Berkeley EECS Department, 2010. Also available
as UC Berkeley Technical Report UCB/EECS-2010-37.
- "Implementing communication-optimal parallel and sequential QR
factorizations." James Demmel, Laura Grigori, Mark Hoemmen, and
Julien Langou. Submitted to SIAM Journal on Scientific Computing.
Available on the ArXiV.
- "Communication-optimal parallel and sequential QR and LU
factorizations." James Demmel, Laura Grigori, Mark Hoemmen,
and Julien Langou. UC Berkeley technical report
(UCB/EECS-2008-89) and LAPACK
Working Note #204, May 2008. Submitted to SIAM Journal on
Scientific Computing.
- "Non-negative
diagonals and high performance on low-profile matrices from
Householder QR." James Demmel, Mark Hoemmen, Yozo Hida,
and E. Jason Riedy. SIAM Journal on Scientific Computing,
Volume 31, Issue 4, pp. 2832-2841, July 2009. (See also UC
Berkeley technical report (UCB/EECS-2008-76) and LAPACK
Working Note #203, May 2008.)
- "Avoiding Communication in Sparse Matrix Computations."
James Demmel, Mark Hoemmen, Marghoob Mohiyuddin, and Katherine Yelick.
IEEE International Parallel and Distributed Processing Symposium, April 2008.
- "Avoiding communication in computing Krylov subspaces."
James Demmel, Mark Hoemmen, Marghoob Mohiyuddin, and Katherine
Yelick. UC Berkeley technical report (UCB/EECS-2007-123), October 2007.
- "Benchmarking sparse matrix-vector multiply in five
minutes." With Hormozd Gahvari. SPEC 2007 (Austin, TX).
- Avoiding communication in computing Krylov subspaces
(UCB/EECS-2007-123, October 2007). James Demmel, Mark Hoemmen, Marghoob
Mohiyuddin, and Katherine Yelick.
PDF (35 MB)
Conference and seminar talks
- "Communication-optimal linear algebra." Poster presented
at SciDAC 2009
(June) in San Diego, CA.
- "Communication-avoiding Krylov subspace methods." Presented
at the Copper Mountain 2008 Iterative Methods conference (April),
and at SIAM Parallel Processing 2008 (March).
- Benchmarking sparse matrix-vector multiply in five minutes
(SPEC Benchmark Workshop 2007, Austin, TX, January 2007).
Hormozd Gahvari, Mark Hoemmen, James Demmel, and Katherine Yelick.
PDF (1 MB),
and PPT slides (6.4 MB).
- "Communication-avoiding Krylov subspace methods." Presented
at the University of California Berkeley Matrix Computations
Seminar on 15 March 2006 and 15 November 2006.
-
A self-tuning benchmark for sparse matrix-vector multiplication.
With Hormozd Gahvari. SIAM CSE Conference (Orlando, Florida), 12
February 2005.
-
Adaptable benchmarks for register blocked sparse matrix-vector
multiplication. Matrix Computations Seminar at UC Berkeley,
5 May 2004.
PPT slides (420 KB) or
PDF slides (420 KB).
General interests
I'm interested in numerical linear algebra, performance tuning,
programming languages (especially parallel and embedded languages), and
software engineering issues for scientific computing -- especially when it
involves using ideas from higher-level programming languages. I have
background in the numerical solution of ordinary and partial differential
equations, as well as numerical optimization.
Mathematics knows no races or geographic boundaries; for mathematics,
the cultural world is one country.
-- David Hilbert
Last updated 09 July 2009.