# Autotuning Sparse Matrix-Vector Multiplication for Multicore

### Jong-Ho Byun, Richard Lin, Katherine A. Yelick and James Demmel

###
EECS Department

University of California, Berkeley

Technical Report No. UCB/EECS-2012-215

November 28, 2012

### http://www.eecs.berkeley.edu/Pubs/TechRpts/2012/EECS-2012-215.pdf

Sparse matrix-vector multiplication (SpMV) is an important kernel in scientific and engineering computing. Straightforward parallel implementations of SpMV often perform poorly, and with the increasing variety of architectural features in multicore processors, it is getting more difficult to determine the sparse matrix data structure and corresponding SpMV implementation that optimize performance. In this paper we present pOSKI, an autotuning system for SpMV that automatically searches over a large set of possible data structures and implementations to optimize SpMV performance on multicore platforms. pOSKI explores a design space that depends on both the nonzero pattern of the sparse matrix, typically not known until run-time, and the architecture, which is explored off-line as much as possible, in order to reduce tuning time. We demonstrate significant performance improvements compared to previous serial and parallel implementations, and compare performance to upper bounds based on architectural models.

BibTeX citation:

@techreport{Byun:EECS-2012-215, Author = {Byun, Jong-Ho and Lin, Richard and Yelick, Katherine A. and Demmel, James}, Title = {Autotuning Sparse Matrix-Vector Multiplication for Multicore}, Institution = {EECS Department, University of California, Berkeley}, Year = {2012}, Month = {Nov}, URL = {http://www.eecs.berkeley.edu/Pubs/TechRpts/2012/EECS-2012-215.html}, Number = {UCB/EECS-2012-215}, Abstract = {Sparse matrix-vector multiplication (SpMV) is an important kernel in scientific and engineering computing. Straightforward parallel implementations of SpMV often perform poorly, and with the increasing variety of architectural features in multicore processors, it is getting more difficult to determine the sparse matrix data structure and corresponding SpMV implementation that optimize performance. In this paper we present pOSKI, an autotuning system for SpMV that automatically searches over a large set of possible data structures and implementations to optimize SpMV performance on multicore platforms. pOSKI explores a design space that depends on both the nonzero pattern of the sparse matrix, typically not known until run-time, and the architecture, which is explored off-line as much as possible, in order to reduce tuning time. We demonstrate significant performance improvements compared to previous serial and parallel implementations, and compare performance to upper bounds based on architectural models.} }

EndNote citation:

%0 Report %A Byun, Jong-Ho %A Lin, Richard %A Yelick, Katherine A. %A Demmel, James %T Autotuning Sparse Matrix-Vector Multiplication for Multicore %I EECS Department, University of California, Berkeley %D 2012 %8 November 28 %@ UCB/EECS-2012-215 %U http://www.eecs.berkeley.edu/Pubs/TechRpts/2012/EECS-2012-215.html %F Byun:EECS-2012-215