Analysis of Multithreaded Architectures for Parallel Computing

Rafael H. Saavedra-Barrera, David E. Culler and Thorsten von Eicken

EECS Department
University of California, Berkeley
Technical Report No. UCB/CSD-90-569
April 1990

http://www2.eecs.berkeley.edu/Pubs/TechRpts/1990/CSD-90-569.pdf

Multithreading has been proposed as an architectural strategy for tolerating latency in multiprocessors and, through limited empirical studies, shown to offer promise. This paper develops an analytical model of multithreaded processor behavior based on a small set of architectural and program parameters. The model gives rise to a large Markov chain, which is solved to obtain a formula for processor efficiency in terms of the number of threads per processor, the remote performance rate, the latency, and the cost of switching between threads. It is shown that a multithreaded processor exhibits three operating regimes: linear (efficiency is proportional to the number of threads), transition, and saturation (efficiency depends only on the remote reference rate and switch cost). Formulae for regime boundaries are derived. The model is embellished to reflect cache degradation due to multithreading, using an analytical model of cache behavior, demonstrating that returns diminish as the number threads becomes large. Predictions from the embellished model correlate well with published empirical measurements. Prescriptive use of the model under various scenarios indicates that multithreading is effective, but the number of useful threads per processor is fairly small.


BibTeX citation:

@techreport{Saavedra-Barrera:CSD-90-569,
    Author = {Saavedra-Barrera, Rafael H. and Culler, David E. and von Eicken, Thorsten},
    Title = {Analysis of Multithreaded Architectures for Parallel Computing},
    Institution = {EECS Department, University of California, Berkeley},
    Year = {1990},
    Month = {Apr},
    URL = {http://www2.eecs.berkeley.edu/Pubs/TechRpts/1990/6174.html},
    Number = {UCB/CSD-90-569},
    Abstract = {Multithreading has been proposed as an architectural strategy for tolerating latency in multiprocessors and, through limited empirical studies, shown to offer promise. This paper develops an analytical model of multithreaded processor behavior based on a small set of architectural and program parameters. The model gives rise to a large Markov chain, which is solved to obtain a formula for processor efficiency in terms of the number of threads per processor, the remote performance rate, the latency, and the cost of switching between threads. It is shown that a multithreaded processor exhibits three operating regimes: linear (efficiency is proportional to the number of threads), transition, and saturation (efficiency depends only on the remote reference rate and switch cost). Formulae for regime boundaries are derived. The model is embellished to reflect cache degradation due to multithreading, using an analytical model of cache behavior, demonstrating that returns diminish as the number threads becomes large. Predictions from the embellished model correlate well with published empirical measurements. Prescriptive use of the model under various scenarios indicates that multithreading is effective, but the number of useful threads per processor is fairly small.}
}

EndNote citation:

%0 Report
%A Saavedra-Barrera, Rafael H.
%A Culler, David E.
%A von Eicken, Thorsten
%T Analysis of Multithreaded Architectures for Parallel Computing
%I EECS Department, University of California, Berkeley
%D 1990
%@ UCB/CSD-90-569
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/1990/6174.html
%F Saavedra-Barrera:CSD-90-569