SC1110090807 0605040302
PPoPP1110090807 060503
ICS1110090807 0605040302
IPDPS1110090807 0605040302
ISCA1110090807 0605040302
ASPLOS11100908 060402
MICRO1110090807 0605040302
HPCA1110090807 0605040302

137 Automatic Thread Extraction with Decoupled Software Pipelining
108 A Dynamic Compilation Framework for Controlling Microprocessor Energy and Performance
105 Stream Programming on General-Purpose Processors
71 A Mechanism for Online Diagnosis of Hard Faults in Microprocessors
62 The TM3270 Media-Processor
59 Dynamic Helper Threaded Prefetching on the Sun UltraSPARC CMP Processor
48 Scalable Store-Load Forwarding via Store Queue Index Prediction
37 Thermal Management of On-Chip Caches Through Power Density Minimization
36 Shader Performance Analysis on a Modern GPU Architecture
34 A Quantum Logic Array Microarchitecture: Scalable Quantum Data Movement and Computation
33 Address-Indexed Memory Disambiguation and Store-to-Load Forwarding
33 ReSlice: Selective Re-Execution of Long-Retired Misspeculated Instructions Using Forward Slicing
32 Cherry-MP: Correctly Integrating Checkpointed Early Resource Recycling in Chip Multiprocessors
29 The Cell Processor Architecture
28 Pinot: Speculative Multi-threading Processor Architecture Exploiting Parallelism over a Wide Range of Granularities
27 Wish Branches: Combining Conditional Branching and Predication for Adaptive Predicated Execution
26 A Criticality Analysis of Clustering in Superscalar Processors
25 Improving Region Selection in Dynamic Optimization Systems
25 Store Memory-Level Parallelism Optimizations for Commercial Applications
25 uComplexity: Estimating Processor Design Effort
25 Cost Sensitive Modulo Scheduling in a Loop Accelerator Synthesis System
23 Continuous Path and Edge Profiling
21 Address-Value Delta (AVD) Prediction: Increasing the Effectiveness of Runahead Execution by Exploiting Regular Memory Allocation Patterns
21 Flea-flicker Multipass Pipelining: An Alternative to the High-Power Out-of-Order Offense
20 How to Fake 1000 Registers
17 Exploiting Vector Parallelism in Software Pipelined Loops
13 Reducing Instruction Fetch Cost by Packing Instructions into RegisterWindows
12 Balancing Resource Utilization to Mitigate Power Density in Processor Pipelines
10 The Future Evolution of High-Performance Microprocessors
6 Efficient Use of Invisible Registers in Thumb Code
5 Incremental Commit Groups for Non-Atomic Trace Processing