332 Evaluating MapReduce for Multi-core and Multiprocessor Systems
210 LogTM-SE: Decoupling Hardware Transactional Memory from Caches
106 Concurrent Direct Network Access for Virtual Machine Monitors
89 Thermal Herding: Microarchitecture Techniques for Controlling Hotspots in High-Performance 3D-Integrated Processors
88 A Scalable, Non-blocking Approach to Transactional Memory
71 Feedback Directed Prefetching: Improving the Performance and Bandwidth-Efficiency of Hardware Prefetchers
69 An Adaptive Shared/Private NUCA Cache Partitioning Scheme for Chip Multiprocessors
59 Extending Multicore Architectures to Exploit Hybrid Parallelism in Single-thread Applications
58 A Burst Scheduling Access Reordering Mechanism
55 HARD: Hardware-Assisted Lockset-based Race Detection
51 Perturbation-based Fault Screening
50 Application-Level Correctness and its Impact on Fault Tolerance
48 MemTracker: Efficient and Programmable Support for Memory Access Monitoring and Debugging
47 Illustrative Design Space Studies with Microarchitectural Regression Models
35 Fully-Buffered DIMM Memory Architectures: Understanding Mechanisms, Overheads and Scaling
31 An Adaptive Cache Coherence Protocol Optimized for Producer-Consumer Sharing
28 Colorama: Architectural Support for Data-Centric Synchronization
28 A Memory-Level Parallelism Aware Fetch Policy for SMT Processors
27 Modeling and Managing Thermal Profiles of Rack-mounted Servers with ThermoStat
25 A Domain-Specific On-Chip Network Design for Large Scale Cache Systems
23 Error Detection via Online Checking of Cache Coherence with Token Coherence Signatures
23 A Low Overhead Fault Tolerant Coherence Protocol for CMP Architectures
23 Interactions Between Compression and Prefetching in Chip Multiprocessors
21 Line Distillation: Increasing Cache Capacity by Filtering Unused Words in Cache Lines
20 Liquid SIMD: Abstracting SIMD Hardware using Lightweight Dynamic Mapping
19 Exploiting Postdominance for Speculative Parallelization
17 Accelerating and Adapting Precomputation Threads for Effcient Prefetching
13 Optical Interconnect Opportunities for Future Server Memory Systems
11 Implications of Device Timing Variability on Full Chip Timing
7 Improving Branch Prediction and Predicated Execution in Out-of-Order Processors
3 Interconnect-Centric Computing
1 Petascale Computing Research Challenges - A Manycore Perspective