Low-complexity Vector Microprocessor Extensions
Joseph James Gebis
EECS Department
University of California, Berkeley
Technical Report No. UCB/EECS-2008-47
May 6, 2008
http://www.eecs.berkeley.edu/Pubs/TechRpts/2008/EECS-2008-47.pdf
For the last few years, single-thread performance has been improving at a snail's pace. Power limitations, increasing relative memory latency, and the exhaustion of improvement in instruction-level parallelism are forcing microprocessor architects to examine new processor design strategies. In this dissertation, I take a look at a technology that can improve the efficiency of modern microprocessors: vectors. Vectors are a simple, power-efficient way to take advantage of common data-level parallelism in an extensible, easily-programmable manner. My work focuses on the process of transitioning from traditional scalar microprocessors to computers that can take advantage of vectors. First, I describe a process for extending existing single-instruction, multiple-data instruction sets to support full vector processing, in a way that remains binary compatible with existing applications. Initial implementations can be low cost, but be transparently extended to higher performance later. I also describe ViVA, the Virtual Vector Architecture. ViVA adds vector-style memory operations to existing microprocessors but does not include arithmetic datapaths; instead, memory instructions work with a new buffer placed between the core and second-level cache. ViVA serves as a low-cost solution to getting much of the performance of full vector memory hierarchies while avoiding the complexity of adding a full vector system. Finally, I test the performance of ViVA by modifying a cycle-accurate full-system simulator to support ViVA's operation. After extensive calibration, I test the basic performance of ViVA using a series of microbenchmarks. I compare the performance of a variety of ViVA configurations for corner turn, used in processing multidimensional data, and sparse matrix-vector multiplication, used in many scientific applications. Results show that ViVA can give significant benefit for a variety of memory access patterns, without relying on a costly hardware prefetcher.
Advisor: David A. Patterson
BibTeX citation:
@phdthesis{Gebis:EECS-2008-47,
Author = {Gebis, Joseph James},
Title = {Low-complexity Vector Microprocessor Extensions},
School = {EECS Department, University of California, Berkeley},
Year = {2008},
Month = {May},
URL = {http://www.eecs.berkeley.edu/Pubs/TechRpts/2008/EECS-2008-47.html},
Number = {UCB/EECS-2008-47},
Abstract = {For the last few years, single-thread performance has been improving at a
snail's pace. Power limitations, increasing relative memory latency, and
the exhaustion of improvement in instruction-level parallelism are
forcing microprocessor architects to examine new processor design
strategies. In this dissertation, I take a look at a technology that can
improve the efficiency of modern microprocessors: vectors. Vectors are a
simple, power-efficient way to take advantage of common data-level
parallelism in an extensible, easily-programmable manner. My work
focuses on the process of transitioning from traditional scalar
microprocessors to computers that can take advantage of vectors.
First, I describe a process for extending existing single-instruction,
multiple-data instruction sets to support full vector processing, in a
way that remains binary compatible with existing applications. Initial
implementations can be low cost, but be transparently extended to higher
performance later.
I also describe ViVA, the Virtual Vector Architecture. ViVA adds
vector-style memory operations to existing microprocessors but does not
include arithmetic datapaths; instead, memory instructions work with a
new buffer placed between the core and second-level cache. ViVA serves
as a low-cost solution to getting much of the performance of full vector
memory hierarchies while avoiding the complexity of adding a full vector
system.
Finally, I test the performance of ViVA by modifying a cycle-accurate
full-system simulator to support ViVA's operation. After extensive
calibration, I test the basic performance of ViVA using a series of
microbenchmarks. I compare the performance of a variety of ViVA
configurations for corner turn, used in processing multidimensional data,
and sparse matrix-vector multiplication, used in many scientific
applications. Results show that ViVA can give significant benefit for a
variety of memory access patterns, without relying on a costly hardware
prefetcher.}
}
EndNote citation:
%0 Thesis %A Gebis, Joseph James %T Low-complexity Vector Microprocessor Extensions %I EECS Department, University of California, Berkeley %D 2008 %8 May 6 %@ UCB/EECS-2008-47 %U http://www.eecs.berkeley.edu/Pubs/TechRpts/2008/EECS-2008-47.html %F Gebis:EECS-2008-47
