|
|
| 143 | Optimizing NUCA Organizations and Wiring Alternatives for Large Caches with CACTI 6.0 |
| 132 | Flattened Butterfly Topology for On-Chip Networks |
| 128 | Stall-Time Fair Memory Access Scheduling for Chip Multiprocessors |
| 111 | Revisiting the Sequential Programming Model for Multi-Core |
| 96 | Dynamic Warp Formation and Scheduling for Efficient GPU Control Flow |
| 92 | A Practical Approach to Exploiting Coarse-Grained Pipeline Parallelism in C Programs |
| 85 | Argus: Low-Cost, Comprehensive Error Detection in Simple Cores |
| 82 | Implementing Signatures for Transactional Memory |
| 78 | Multi-bit Error Tolerant Caches Using Two-Dimensional Error Coding |
| 73 | Composable Lightweight Processors |
| 62 | FPGA-Accelerated Simulation Technologies (FAST): Fast, Full-System, Cycle-Accurate Simulators |
| 62 | A Framework for Providing Quality of Service in Chip Multi-Processors |
| 61 | Penelope: The NBTI-Aware Processor |
| 58 | Software-Based Online Detection of Hardware Defects Mechanisms, Architectural Support, and Evaluation |
| 55 | Mitigating Parameter Variation with Dynamic Fine-Grain Body Biasing |
| 51 | Self-calibrating Online Wearout Detection |
| 40 | Smart Refresh: An Enhanced Memory Controller Design for Reducing Energy in Conventional and 3D Die-Stacked DRAMs |
| 40 | Data Access Partitioning for Fine-grain Parallelism on Multicore Architectures |
| 36 | Process Variation Tolerant 3T1D-Based Cache Architectures |
| 31 | Using Address Independent Seed Encryption and Bonsai Merkle Trees to Make Secure Processors OS- and Performance-Friendly |
| 27 | Microarchitectural Design Space Exploration Using an Architecture-Centric Approach |
| 25 | Scavenger: A New Last Level Cache Architecture with Global Block Priority |
| 24 | Uncorq: Unconstrained Snoop Request Delivery in Embedded-Ring Multiprocessors |
| 23 | Emulating Optimal Replacement with a Shepherd Cache |
| 21 | Leveraging 3D Technology for Improved Reliability |
| 20 | Impact of Cache Coherence Protocols on the Processing of Network Traffic |
| 17 | Effective Optimistic-Checker Tandem Core Design through Architectural Pruning |
| 16 | Low-Cost Epoch-Based Correlation Prefetching for Commercial Applications |
| 16 | A Framework for Coarse-Grain Optimizations in the On-Chip Memory Hierarchy |
| 16 | Guaranteeing Hits to Improve the Efficiency of a Small Instruction Cache |
| 13 | Global Multi-Threaded Instruction Scheduling |
| 10 | Time Interpolation: So Many Metrics, So Few Registers |
| 8 | Informed Microarchitecture Design Space Exploration Using Workload Dynamics |
| 7 | Optimal versus Heuristic Global Code Scheduling |
| 5 | The Art of Deception: Adaptive Precision Reduction for Area Efficient Physics Acceleration |