SC1110090807 0605040302
PPoPP1110090807 060503
ICS1110090807 0605040302
IPDPS1110090807 0605040302
ISCA1110090807 0605040302
ASPLOS11100908 060402
MICRO1110090807 0605040302
HPCA1110090807 0605040302

229 Design tradeoffs for tiled CMP on-chip networks
170 A case for high performance computing with virtual machines
58 Cooperative checkpointing: a robust approach to large-scale systems reliability
53 Design space exploration for multicore architectures: a power/performance/thermal view
49 Accelerating sparse matrix computations via data compression
48 Online power-performance adaptation of multithreaded programs using hardware event-based prediction
45 STAR-MPI: self tuned adaptive routines for MPI collective operations
37 On the performance potential of different types of speculative thread-level parallelism
37 Probabilistic accuracy bounds for fault-tolerant computations that discard tasks
35 Accelerator design for protein sequence HMM search
29 Large files, small writes, and pNFS
28 Violated dependence analysis
27 Profitable loop fusion and tiling using model-driven empirical search
24 MPIPP: an automatic profile-guided parallel process placement toolset for SMP clusters and multiclusters
22 Scalable algorithms for global snapshots in distributed systems
16 Heterogeneous way-size cache
14 Scalable, fault tolerant membership for MPI tasks on HPC systems
14 TMA: a trap-based memory architecture
13 Experimental evaluation of application-level checkpointing for OpenMP programs
12 BranchTap: improving performance with very few checkpoints through adaptive speculation control
12 Scientific applications vs. SPEC-FP: a comparison of program behavior
11 Accurate memory data flow modeling in statistical simulation
10 The exigency of benchmark and compiler drift: designing tomorrow's processors with yesterday's tools
10 Multigrid and Gauss-Seidel smoothers revisited: parallelization on chip multiprocessors
10 A distributed system based on web services for computational science simulations
9 Efficient remote block-level I/O over an RDMA-capable NIC
9 A scalable communication layer for multi-dimensional hyper crossbar network using multiple gigabit ethernet
8 Coupling prefix caching and collective downloads for remote dataset access
8 Feedback-directed memory disambiguation through store distance analysis
7 Selective predicate prediction for out-of-order processors
7 User-guided symbiotic space-sharing of real workloads
7 Lightweight lock-free synchronization methods for multithreading
6 A scalable low power issue queue for large instruction window processors
6 Scaling MPI to short-memory MPPs such as BG/L
3 A modern high-performance processor pipeline
3 Implementing virtual memory in a vector processor with software restart markers
3 Sensitivity analysis of knapsack-based task scheduling on the grid
1 Wide and efficient trace prediction using the local trace predictor
0 Quantum mechanical approaches to information processing