Jike Chong and Youngmin Yi and Arlo Faria and Nadathur Rajagopalan Satish and Kurt Keutzer

EECS Department, University of California, Berkeley

Technical Report No. UCB/EECS-2008-69

May 22, 2008

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2008/EECS-2008-69.pdf

Automatic speech recognition is a key technology for enabling rich human-computer interaction in emerging applications. Hidden Markov Model (HMM) based recognition approaches are widely used for modeling the human speech process by constructing probabilistic estimates of the underlying word sequence from an acoustic signal. High-accuracy speech recognition, however, requires complex models, large vocabulary sizes, and exploration of a very large search space, making the computation too intense for current personal and mobile platforms. In this paper, we explore opportunities for parallelizing the HMM based Viterbi search algorithm typically used for large-vocabulary continuous speech recognition (LVCSR), and present an efficient implementation on current many-core platforms. For the case study, we use a recognition model of 50,000 English words, with more than 500,000 word bigram transitions, and one million hidden states. We examine important implementation tradeoffs for shared-memory single-chip many-core processors by implementing LVCSR on the NVIDIA G80 Graphics Processing Unit (GPU) in Compute Unified Device Architecture (CUDA), leading to significant speedups. This work is an important step forward for LVCSR-based applications to leverage many-core processors in achieving real-time performance on personal and mobile computing platforms.


BibTeX citation:

@techreport{Chong:EECS-2008-69,
    Author= {Chong, Jike and Yi, Youngmin and Faria, Arlo and Satish, Nadathur Rajagopalan and Keutzer, Kurt},
    Title= {Data-Parallel Large Vocabulary Continuous Speech Recognition on Graphics Processors},
    Year= {2008},
    Month= {May},
    Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2008/EECS-2008-69.html},
    Number= {UCB/EECS-2008-69},
    Abstract= {Automatic speech recognition is a key technology for enabling rich human-computer interaction in emerging applications. Hidden Markov Model (HMM) based recognition approaches are widely used for modeling the human speech process by constructing probabilistic estimates of the underlying word sequence from an acoustic signal. High-accuracy speech recognition, however, requires complex models, large vocabulary sizes, and exploration of a very large search space, making the computation too intense for current personal and mobile platforms. In this paper, we explore opportunities for parallelizing the HMM based Viterbi search algorithm typically used for large-vocabulary continuous speech recognition (LVCSR), and present an efficient implementation on current many-core platforms. For the case study, we use a recognition model of 50,000 English words, with more than 500,000 word bigram transitions, and one million hidden states. We examine important implementation tradeoffs for shared-memory single-chip many-core processors by implementing LVCSR on the NVIDIA G80 Graphics Processing Unit (GPU) in Compute Unified Device Architecture (CUDA), leading to significant speedups.  This work is an important step forward for LVCSR-based applications to leverage many-core processors in achieving real-time performance on personal and mobile computing platforms.},
}

EndNote citation:

%0 Report
%A Chong, Jike 
%A Yi, Youngmin 
%A Faria, Arlo 
%A Satish, Nadathur Rajagopalan 
%A Keutzer, Kurt 
%T Data-Parallel Large Vocabulary Continuous Speech Recognition on Graphics Processors
%I EECS Department, University of California, Berkeley
%D 2008
%8 May 22
%@ UCB/EECS-2008-69
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2008/EECS-2008-69.html
%F Chong:EECS-2008-69