Large Vocabulary Automatic Speech Recognition on Emerging Architectures

Adam Janin
(Professor Nelson H. Morgan)
(NSF) IIS-0121396 and Swiss Research Network IM2

Automatic speech recognition (ASR) provides a natural interface to small form-factor computers (such as PDAs) since keyboards and large displays are absent on these platforms. However, large vocabulary, robust ASR requires hardware resources far beyond those available on current PDAs. Emerging architectures, such as Vector IRAM at UC Berkeley, and Imagine at Stanford, provide a partial solution by delivering very high performance for relatively little expenditure of power. However, for speech recognition to take advantage of these architectures, the components of the system must be redesigned with the new systems in mind.

We are currently adapting the workstation-based ASR system used at ICSI to run efficiently on these architectures. Two out of the three major components of ICSI's speech system, the acoustic front-end and the phoneme probability estimator, contain computational kernels that are very regular (FFT and matrix-matrix multiply, respectively). These components run extremely efficiently on both architectures. The third component, the decoder, consists of a highly pruned (and therefore irregular) search through all possible utterances. Thus, the primary focus of our current effort is on this portion of the speech system.

Our initial implementation consists of a small vocabulary system. With a small vocabulary, it is not necessary to share state among similar words; rather, one can evaluate all the words separately. This allows an efficient, regular implementation. On IRAM, we arrange batches of words with total length equal to the vector length. On Imagine, we batch words such that the total length will fit in the cluster memory. We are in the process of analyzing the results of this approach.

Future work includes running a large vocabulary system on these architectures. This involves picking a search order that will maximize reuse of state from previous searches (e.g., if the word "architecture" has already been processed, most of the work can be reused for the word "architectural"). Language modeling, beam pruning, and least-upper-bound path calculations may also be accelerated on these architectures.


Send mail to the author : (janin@icsi.berkeley.edu)


Edit this abstract