Data-Parallel Large Vocabulary Continuous Speech Recognition on Manycore Processors
Jike Chong, Yi Youngmin, Arlo Faria, Nadathur Rajagopalan Satish and Kurt Keutzer
Gigascale Systems Research Center
Automatic speech recognition is a key technology for enabling rich human-computer interaction in emerging applications. Hidden Markov Model (HMM) based recognition approaches are widely used for modeling the human speech process by constructing probabilistic estimates of the underlying word sequence from an acoustic signal. High-accuracy speech recognition, however, requires complex models, large vocabulary sizes, and exploration of a very large search space, making the computation too intense for current personal and mobile platforms.
In this work, we explored opportunities for parallelizing the HMM based Viterbi search algorithm typically used for large-vocabulary continuous speech recognition (LVCSR), and presented an efficient implementation on current many-core platforms. For the case study, we are using a recognition model of 50,000 English words, with more than 500,000 word-to-word transition probabilities.
We examine important implementation tradeoffs for shared-memory single-chip many-core processors by implementing LVCSR on the NVIDIA Graphics Processing Unit (GPU) in Compute Uniﬁed Device Architecture (CUDA), leading to significant speedups. We are exploring an extension to handle WFST-based recognition networks and would like to explore the potential speedups on other manycore processor as they become available.