Power Reduction Techniques for Multimedia Processors
Dominic Aldo Antonelli and Alan J. Smith
Multimedia processing algorithms such as video encoding and decoding are becoming very popular. Nearly every home computer and many more dedicated computer systems such as DVD and MP3 players do some form of multimedia processing on a regular basis. A number of architecture extensions, such as MMX and SSE, have been added to general-purpose CPUs in order to efficiently handle such tasks. In addition, there are a growing number of dedicated multimedia processors, such as the TriMedia family of processors, that make such extensions a core part of their instruction set architecture. These extensions generally have features such as saturating arithmetic and sub-word single-instruction multiple-data operations that significantly improve performance and power-efficiency for these applications.
In multimedia processors, the real goal of optimization is not raw performance, but rather minimizing energy use while delivering a particular real-time standard of performance. A lot of research has been done on reducing power consumption in this type of device, and many microarchitectural techniques such as fine-grained clock gating and phased caches have been proposed and used in industry. Unfortunately, much of the analysis on how effective such techniques are is based on benchmark code that is not optimized at all for the target architecture, even though nearly all production code is highly optimized. Many techniques behave very differently when applied to optimized code. In addition, many multimedia processors have advanced features, such as software-controlled hardware prefetching, that drastically change the behavior of parts of the processor--the caches, in the case of hardware prefetching. Most research assumes fairly simple processors without such features, and thus can give inaccurate estimates of how effective the techniques applied really are. Academic research has also tended to use overly simplified models for power consumption and performance, and has also often been directed at custom logic designs rather than the heavily synthesized logic typically used in dedicated multimedia chips and SOCs.
The primary goal of this research is to reduce power consumption of multimedia processors with state-of-the-art performance-enhancing features operating on highly optimized code. To that end, we are analyzing existing microarchitecture level power-reduction schemes, as well as exploring new techniques. In order to be as accurate as possible, we use a cycle-accurate simulator of a real processor, an industrial strength synthesis and simulation package with power analysis tools, and real production silicon for truly accurate power consumption data. In addition, we are working closely with the people at NXP who are actually designing and building TriMedia processors, in order to keep this research practical and get the perspectives of people who design real systems. This also provides us with tools unavailable to other researchers, such as cycle-accurate simulators of real machines and production silicon implementations of multimedia processors.