Audio Coding Using Frequency-Domain Linear Prediction
Vijay Nayak Ullal, Petr Motlicek1 and Nelson Morgan
Since the dynamics of speech vary rapidly over time, most speech analysis techniques assume short-term stationarity within the signal . Many existing speech coders use classical linear prediction (LP) to approximate the speech signal's spectral envelope over short frames, usually between 10 and 30 milliseconds. Since LP-based coding uses the source-filter model of speech production, these techniques do not work well for other types of audio signals, such as music . However, the evolution of vocal tract shape can be largely predictable and it may be more efficient to encode longer segments of speech on the order of hundreds of milliseconds. This can be accomplished by modeling the temporal signal instead of short-term spectra .
The coding method we use, called Frequency-Domain Linear Prediction (FDLP), approximates a signal's temporal evolution by fitting an autoregressive model to the signal's squared Hilbert envelope. This method is performed in critically band-sized sub-bands . Encoding the Hilbert envelope is fairly straightforward, so the main goal of this research is to find more efficient ways to model and encode the residual information, known as the Hilbert carrier.
- P. Motlicek, H. Hermansky, H. Garudadri, and N. Srinivasamurthy, Audio Coding Based on Long Temporal Contexts, Technical Report IDIAP-RR 06-30, http://www.idiap.ch, April 2006.
- A. S. Spanias, "Speech Coding: A Tutorial Review," Proc. of IEEE, Vol. 82, No. 10, October 1994.
- J. Herre, "Temporal Noise Shaping, Quantization, and Coding Methods in Perceptual Audio Coding: A Tutorial Introduction," AES 17th Int. Conf. High Quality Audio Coding, Florence, Italy, September 1999.
- M. Athineos, H. Hermansky, and D. P. W. Ellis, "LP-TRAP: Linear Predictive Temporal Patterns," Proc. ICSLP, Jeju, S. Korea, October 2004, pp. 1154-1157.
1Institut Dalle Molle d'Intelligence Artificielle Perceptive (IDIAP)