539 Orion: a power-performance simulator for interconnection networks
193 Cherry: checkpointed early resource recycling in out-of-order microprocessors
170 Using modern graphics architectures for general-purpose computing: a framework and analysis
156 Microarchitectural exploration with Liberty
141 Drowsy instruction caches: leakage power reduction using dynamic voltage scaling and cache sub-bank prediction
139 Reducing register ports for higher speed and lower energy
136 Dynamic frequency and voltage control for a multiple clock domain microarchitecture
130 Master/slave speculative parallelization
105 Vector vs. superscalar and VLIW architectures for embedded multimedia benchmarks
103 Optimizing pipelines for power and performance
92 DELI: a new run-time control point
91 Convergent scheduling
89 Compiler-directed instruction cache leakage optimization
88 Pointer cache assisted prefetching
77 Hierarchical Scheduling Windows
74 Energy efficient frequent value data cache design
62 Managing static leakage energy in microprocessor functional units
59 Register write specialization register read specialization: a path to complexity-effective wide-issue superscalar processors
53 Microarchitectural denial of service: insuring microarchitectural fairness
50 Exploiting data-width locality to increase superscalar execution bandwidth
45 Fetching instruction streams
42 Characterizing and predicting value degree of use
35 A faster optimal register allocator
35 Power protocol: reducing power dissipation on off-chip data buses
32 Vacuum packing: extracting hardware-detected program phases for post-link optimization
28 Effective instruction scheduling techniques for an interleaved cache clustered VLIW processor
25 A quantitative framework for automated pre-execution thread selection
22 Generating physical addresses directly for saving instruction TLB energy
19 Compiling for instruction cache performance on a multithreaded architecture
18 Compiler managed micro-cache bypassing for high performance EPIC processors
16 Three extensions to register integration
13 Reduced code size modulo scheduling in the absence of hardware support
13 Three-dimensional memory vectorization for high bandwidth media memory systems
7 Instruction fetch deferral using static slack
7 Microarchitectural support for precomputation microthreads
6 Dynamic addressing memory arrays with physical locality