|
|
| 134 | Debunking the 100X GPU vs CPU myth: an evaluation of throughput computing on CPU and GPU |
| 50 | An integrated GPU power and performance model |
| 48 | Understanding sources of inefficiency in general-purpose chips |
| 41 | High performance cache replacement using re-reference interval prediction (RRIP) |
| 33 | Web search using mobile cores: quantifying and mitigating the price of efficiency |
| 33 | Energy proportional datacenter networks |
| 33 | NoHype: virtualized cloud infrastructure without the virtualization |
| 32 | Rethinking DRAM design and organization for energy-constrained multi-cores |
| 27 | Security refresh: prevent malicious wear-out and increase durability for phase-change memory with dynamically randomized address mapping |
| 24 | Dynamic warp subdivision for integrated branch and memory divergence tolerance |
| 23 | Use ECP, not ECC, for hard failures in resistive memories |
| 22 | Conflict exceptions: simplifying concurrent language semantics with precise hardware exceptions for data-races |
| 20 | An intra-chip free-space optical interconnect |
| 20 | Re-architecting DRAM memory systems with monolithically integrated silicon photonics |
| 20 | Morphable memory system: a robust architecture for exploiting multi-level phase change memories |
| 20 | Resistive computation: avoiding the power wall with low-leakage, STT-MRAM based computing |
| 20 | Relax: an architectural framework for software recovery of hardware faults |
| 19 | Aérgia: exploiting packet latency slack in on-chip networks |
| 19 | A case for FAME: FPGA architecture model execution |
| 19 | The impact of management operations on the virtualized datacenter |
| 18 | Energy-performance tradeoffs in processor architecture and circuit design: a marginal cost analysis |
| 17 | Evolution of thread-level parallelism in desktop applications |
| 17 | Modeling critical sections in Amdahl's law and its implications for multicore design |
| 16 | Elastic cooperative caching: an autonomous dynamically adaptive memory hierarchy for chip multiprocessors |
| 15 | Silicon-photonic network architectures for scalable, power-efficient multi-chip systems |
| 14 | Reducing cache power with low-cost, multi-bit error-correcting codes |
| 13 | ColorSafe: architectural support for debugging and dynamically avoiding multi-variable atomicity violations |
| 12 | SieveStore: a highly-selective, ensemble-level disk cache for cost-performance |
| 12 | RETCON: transactional repair without replay |
| 10 | WiDGET: Wisconsin decoupled grid execution tiles |
| 9 | The virtual write queue: coordinating DRAM and last-level cache policies |
| 9 | Data marshaling for multi-core architectures |
| 7 | Translation caching: skip, don't walk (the page table) |
| 6 | Forwardflow: a scalable core for power-constrained CMPs |
| 6 | LReplay: a pending period based deterministic replay scheme |
| 6 | Necromancer: enhancing system throughput by animating dead cores |
| 5 | Timetraveler: exploiting acyclic races for optimizing memory race recording |
| 5 | Thread tailor: dynamically weaving threads together for efficient, adaptive parallel applications |
| 5 | Cohesion: a hybrid memory model for accelerators |
| 5 | Using hardware vulnerability factors to enhance AVF analysis |
| 4 | A dynamically configurable coprocessor for convolutional neural networks |
| 4 | Leveraging the core-level complementary effects of PVT variations to reduce timing emergencies in multi-core processors |
| 1 | Sentry: light-weight auxiliary memory access control |
| 0 | IVEC: off-chip memory integrity protection for both security and reliability |