Wideband Audio Coding Using Frequency-Domain Linear Prediction
Vijay Ullal, Petr Motlicek and Nelson Morgan
Current wideband audio codecs do not always compress speech well. We are developing a technique that encodes both speech and music in an efficient manner. Our technique uses Frequency-Domain Linear Prediction (FDLP), which models a signal's temporal evolution by fitting an autoregressive model to the signal's squared Hilbert envelope . This method is performed over long temporal segments, on the order of one second, in sub-bands . We believe that our FDLP method is more resilient to dropouts when compared to current codecs. Due to the longer algorithmic delay in this technique, our focus is on low-to-medium bit rate applications where latency requirements are less stringent.
Through objective measures and a series of listening tests, we have shown that our technique performs comparably to state-of-the-art codecs at 64 kbps. Our current goal is to further reduce bit rate while maintaining quality. This will require finding more efficient ways to model and encode the residual information, known as the Hilbert carrier. We plan to use psychoacoustic principles such as simultaneous and temporal masking as well as entropy coding, among other methods.
- M. Athineos, H. Hermansky, and D. P. W. Ellis, "LP-TRAP: Linear Predictive Temporal Patterns," Proc. ICSLP, Jeju, S. Korea, October 2004, pp. 1154-1157.
- P. Motlicek, H. Hermansky, H. Garudadri, and N. Srinivasamurthy, Audio Coding Based on Long Temporal Contexts, Technical Report IDIAP-RR 06-30, http://www.idiap.ch, April 2006.