|
|
| 137 | Automatic Thread Extraction with Decoupled Software Pipelining |
| 108 | A Dynamic Compilation Framework for Controlling Microprocessor Energy and Performance |
| 105 | Stream Programming on General-Purpose Processors |
| 71 | A Mechanism for Online Diagnosis of Hard Faults in Microprocessors |
| 62 | The TM3270 Media-Processor |
| 59 | Dynamic Helper Threaded Prefetching on the Sun UltraSPARC CMP Processor |
| 48 | Scalable Store-Load Forwarding via Store Queue Index Prediction |
| 37 | Thermal Management of On-Chip Caches Through Power Density Minimization |
| 36 | Shader Performance Analysis on a Modern GPU Architecture |
| 34 | A Quantum Logic Array Microarchitecture: Scalable Quantum Data Movement and Computation |
| 33 | Address-Indexed Memory Disambiguation and Store-to-Load Forwarding |
| 33 | ReSlice: Selective Re-Execution of Long-Retired Misspeculated Instructions Using Forward Slicing |
| 32 | Cherry-MP: Correctly Integrating Checkpointed Early Resource Recycling in Chip Multiprocessors |
| 29 | The Cell Processor Architecture |
| 28 | Pinot: Speculative Multi-threading Processor Architecture Exploiting Parallelism over a Wide Range of Granularities |
| 27 | Wish Branches: Combining Conditional Branching and Predication for Adaptive Predicated Execution |
| 26 | A Criticality Analysis of Clustering in Superscalar Processors |
| 25 | Improving Region Selection in Dynamic Optimization Systems |
| 25 | Store Memory-Level Parallelism Optimizations for Commercial Applications |
| 25 | uComplexity: Estimating Processor Design Effort |
| 25 | Cost Sensitive Modulo Scheduling in a Loop Accelerator Synthesis System |
| 23 | Continuous Path and Edge Profiling |
| 21 | Address-Value Delta (AVD) Prediction: Increasing the Effectiveness of Runahead Execution by Exploiting Regular Memory Allocation Patterns |
| 21 | Flea-flicker Multipass Pipelining: An Alternative to the High-Power Out-of-Order Offense |
| 20 | How to Fake 1000 Registers |
| 17 | Exploiting Vector Parallelism in Software Pipelined Loops |
| 13 | Reducing Instruction Fetch Cost by Packing Instructions into RegisterWindows |
| 12 | Balancing Resource Utilization to Mitigate Power Density in Processor Pipelines |
| 10 | The Future Evolution of High-Performance Microprocessors |
| 6 | Efficient Use of Invisible Registers in Thumb Code |
| 5 | Incremental Commit Groups for Non-Atomic Trace Processing |