SC1110090807 0605040302
PPoPP1110090807 060503
ICS1110090807 0605040302
IPDPS1110090807 0605040302
ISCA1110090807 0605040302
ASPLOS11100908 060402
MICRO1110090807 0605040302
HPCA1110090807 0605040302

143 Optimizing NUCA Organizations and Wiring Alternatives for Large Caches with CACTI 6.0
132 Flattened Butterfly Topology for On-Chip Networks
128 Stall-Time Fair Memory Access Scheduling for Chip Multiprocessors
111 Revisiting the Sequential Programming Model for Multi-Core
96 Dynamic Warp Formation and Scheduling for Efficient GPU Control Flow
92 A Practical Approach to Exploiting Coarse-Grained Pipeline Parallelism in C Programs
85 Argus: Low-Cost, Comprehensive Error Detection in Simple Cores
82 Implementing Signatures for Transactional Memory
78 Multi-bit Error Tolerant Caches Using Two-Dimensional Error Coding
73 Composable Lightweight Processors
62 FPGA-Accelerated Simulation Technologies (FAST): Fast, Full-System, Cycle-Accurate Simulators
62 A Framework for Providing Quality of Service in Chip Multi-Processors
61 Penelope: The NBTI-Aware Processor
58 Software-Based Online Detection of Hardware Defects Mechanisms, Architectural Support, and Evaluation
55 Mitigating Parameter Variation with Dynamic Fine-Grain Body Biasing
51 Self-calibrating Online Wearout Detection
40 Smart Refresh: An Enhanced Memory Controller Design for Reducing Energy in Conventional and 3D Die-Stacked DRAMs
40 Data Access Partitioning for Fine-grain Parallelism on Multicore Architectures
36 Process Variation Tolerant 3T1D-Based Cache Architectures
31 Using Address Independent Seed Encryption and Bonsai Merkle Trees to Make Secure Processors OS- and Performance-Friendly
27 Microarchitectural Design Space Exploration Using an Architecture-Centric Approach
25 Scavenger: A New Last Level Cache Architecture with Global Block Priority
24 Uncorq: Unconstrained Snoop Request Delivery in Embedded-Ring Multiprocessors
23 Emulating Optimal Replacement with a Shepherd Cache
21 Leveraging 3D Technology for Improved Reliability
20 Impact of Cache Coherence Protocols on the Processing of Network Traffic
17 Effective Optimistic-Checker Tandem Core Design through Architectural Pruning
16 Low-Cost Epoch-Based Correlation Prefetching for Commercial Applications
16 A Framework for Coarse-Grain Optimizations in the On-Chip Memory Hierarchy
16 Guaranteeing Hits to Improve the Efficiency of a Small Instruction Cache
13 Global Multi-Threaded Instruction Scheduling
10 Time Interpolation: So Many Metrics, So Few Registers
8 Informed Microarchitecture Design Space Exploration Using Workload Dynamics
7 Optimal versus Heuristic Global Code Scheduling
5 The Art of Deception: Adaptive Precision Reduction for Area Efficient Physics Acceleration