Electrical Engineering
      and Computer Sciences

Electrical Engineering and Computer Sciences

COLLEGE OF ENGINEERING

UC Berkeley

Improving communication performance in dense linear algebra via topology aware collectives

Edgar Solomonik, Abhinav Bhatele and James Demmel

EECS Department
University of California, Berkeley
Technical Report No. UCB/EECS-2011-92
August 15, 2011

http://www.eecs.berkeley.edu/Pubs/TechRpts/2011/EECS-2011-92.pdf

Recent results have shown that topology aware mapping reduces network contention in communication-intensive kernels on massively parallel machines. We demonstrate that on mesh interconnects, topology aware mapping also allows for the utilization of highly-efficient topology aware collectives. We map novel 2.5D dense linear algebra algorithms to exploit rectangular collectives on cuboid partitions allocated by a Blue Gene/P supercomputer. Our mappings allow the algorithms to exploit optimized line multicasts and reductions. Commonly used 2D algorithms cannot be mapped in this fashion. On 16,384 nodes (65,536 cores) of Blue Gene/P, 2.5D algorithms that exploit rectangular collectives are significantly faster than 2D matrix multiplication (MM) and LU factorization, up to 8.7x and 2.1x, respectively. These speed-ups are due to communication reduction (up to 95.6% for 2.5D MM with respect to 2D MM). We also derive LogP-based novel performance models for rectangular broadcasts and reductions. Using those, we model the performance of matrix multiplication and LU factorization on a hypothetical exascale architecture.


BibTeX citation:

@techreport{Solomonik:EECS-2011-92,
    Author = {Solomonik, Edgar and Bhatele, Abhinav and Demmel, James},
    Title = {Improving communication performance in dense linear algebra via topology aware collectives},
    Institution = {EECS Department, University of California, Berkeley},
    Year = {2011},
    Month = {Aug},
    URL = {http://www.eecs.berkeley.edu/Pubs/TechRpts/2011/EECS-2011-92.html},
    Number = {UCB/EECS-2011-92},
    Abstract = {Recent results have shown that topology aware mapping reduces network contention in communication-intensive kernels on massively parallel machines. We demonstrate that on mesh interconnects, topology aware mapping also allows for the utilization of highly-efficient topology aware collectives. We map novel 2.5D dense linear algebra algorithms to exploit rectangular collectives on cuboid partitions allocated by a Blue Gene/P supercomputer. Our mappings allow the algorithms to exploit optimized line multicasts and reductions. Commonly used 2D algorithms cannot be mapped in this fashion. On 16,384 nodes (65,536 cores) of Blue Gene/P, 2.5D algorithms that exploit rectangular collectives are significantly faster than 2D matrix multiplication (MM) and LU factorization, up to 8.7x and 2.1x, respectively. These speed-ups are due to communication reduction (up to 95.6% for 2.5D MM with respect to 2D MM). We also derive LogP-based novel performance models for rectangular broadcasts and reductions. Using those, we model the performance of matrix multiplication and LU factorization on a hypothetical exascale architecture.}
}

EndNote citation:

%0 Report
%A Solomonik, Edgar
%A Bhatele, Abhinav
%A Demmel, James
%T Improving communication performance in dense linear algebra via topology aware collectives
%I EECS Department, University of California, Berkeley
%D 2011
%8 August 15
%@ UCB/EECS-2011-92
%U http://www.eecs.berkeley.edu/Pubs/TechRpts/2011/EECS-2011-92.html
%F Solomonik:EECS-2011-92