|
|
| 229 | Design tradeoffs for tiled CMP on-chip networks |
| 170 | A case for high performance computing with virtual machines |
| 58 | Cooperative checkpointing: a robust approach to large-scale systems reliability |
| 53 | Design space exploration for multicore architectures: a power/performance/thermal view |
| 49 | Accelerating sparse matrix computations via data compression |
| 48 | Online power-performance adaptation of multithreaded programs using hardware event-based prediction |
| 45 | STAR-MPI: self tuned adaptive routines for MPI collective operations |
| 37 | On the performance potential of different types of speculative thread-level parallelism |
| 37 | Probabilistic accuracy bounds for fault-tolerant computations that discard tasks |
| 35 | Accelerator design for protein sequence HMM search |
| 29 | Large files, small writes, and pNFS |
| 28 | Violated dependence analysis |
| 27 | Profitable loop fusion and tiling using model-driven empirical search |
| 24 | MPIPP: an automatic profile-guided parallel process placement toolset for SMP clusters and multiclusters |
| 22 | Scalable algorithms for global snapshots in distributed systems |
| 16 | Heterogeneous way-size cache |
| 14 | Scalable, fault tolerant membership for MPI tasks on HPC systems |
| 14 | TMA: a trap-based memory architecture |
| 13 | Experimental evaluation of application-level checkpointing for OpenMP programs |
| 12 | BranchTap: improving performance with very few checkpoints through adaptive speculation control |
| 12 | Scientific applications vs. SPEC-FP: a comparison of program behavior |
| 11 | Accurate memory data flow modeling in statistical simulation |
| 10 | The exigency of benchmark and compiler drift: designing tomorrow's processors with yesterday's tools |
| 10 | Multigrid and Gauss-Seidel smoothers revisited: parallelization on chip multiprocessors |
| 10 | A distributed system based on web services for computational science simulations |
| 9 | Efficient remote block-level I/O over an RDMA-capable NIC |
| 9 | A scalable communication layer for multi-dimensional hyper crossbar network using multiple gigabit ethernet |
| 8 | Coupling prefix caching and collective downloads for remote dataset access |
| 8 | Feedback-directed memory disambiguation through store distance analysis |
| 7 | Selective predicate prediction for out-of-order processors |
| 7 | User-guided symbiotic space-sharing of real workloads |
| 7 | Lightweight lock-free synchronization methods for multithreading |
| 6 | A scalable low power issue queue for large instruction window processors |
| 6 | Scaling MPI to short-memory MPPs such as BG/L |
| 3 | A modern high-performance processor pipeline |
| 3 | Implementing virtual memory in a vector processor with software restart markers |
| 3 | Sensitivity analysis of knapsack-based task scheduling on the grid |
| 1 | Wide and efficient trace prediction using the local trace predictor |
| 0 | Quantum mechanical approaches to information processing |