|
|
| 539 | Orion: a power-performance simulator for interconnection networks |
| 193 | Cherry: checkpointed early resource recycling in out-of-order microprocessors |
| 170 | Using modern graphics architectures for general-purpose computing: a framework and analysis |
| 156 | Microarchitectural exploration with Liberty |
| 141 | Drowsy instruction caches: leakage power reduction using dynamic voltage scaling and cache sub-bank prediction |
| 139 | Reducing register ports for higher speed and lower energy |
| 136 | Dynamic frequency and voltage control for a multiple clock domain microarchitecture |
| 130 | Master/slave speculative parallelization |
| 105 | Vector vs. superscalar and VLIW architectures for embedded multimedia benchmarks |
| 103 | Optimizing pipelines for power and performance |
| 92 | DELI: a new run-time control point |
| 91 | Convergent scheduling |
| 89 | Compiler-directed instruction cache leakage optimization |
| 88 | Pointer cache assisted prefetching |
| 77 | Hierarchical Scheduling Windows |
| 74 | Energy efficient frequent value data cache design |
| 62 | Managing static leakage energy in microprocessor functional units |
| 59 | Register write specialization register read specialization: a path to complexity-effective wide-issue superscalar processors |
| 53 | Microarchitectural denial of service: insuring microarchitectural fairness |
| 50 | Exploiting data-width locality to increase superscalar execution bandwidth |
| 45 | Fetching instruction streams |
| 42 | Characterizing and predicting value degree of use |
| 35 | A faster optimal register allocator |
| 35 | Power protocol: reducing power dissipation on off-chip data buses |
| 32 | Vacuum packing: extracting hardware-detected program phases for post-link optimization |
| 28 | Effective instruction scheduling techniques for an interleaved cache clustered VLIW processor |
| 25 | A quantitative framework for automated pre-execution thread selection |
| 22 | Generating physical addresses directly for saving instruction TLB energy |
| 19 | Compiling for instruction cache performance on a multithreaded architecture |
| 18 | Compiler managed micro-cache bypassing for high performance EPIC processors |
| 16 | Three extensions to register integration |
| 13 | Reduced code size modulo scheduling in the absence of hardware support |
| 13 | Three-dimensional memory vectorization for high bandwidth media memory systems |
| 7 | Instruction fetch deferral using static slack |
| 7 | Microarchitectural support for precomputation microthreads |
| 6 | Dynamic addressing memory arrays with physical locality |