1847 Search and replication in unstructured peer-to-peer networks
130 Critical power slope: understanding the runtime effects of frequency scaling
106 A network-failure-tolerant message-passing system for terascale clusters
82 The architecture of the DIVA processing-in-memory chip
73 BProc: the Beowulf distributed process space
62 Bloom filtering cache misses for accurate data speculation and prefetching
57 DualFS: a new journaling file system without meta-data duplication
53 Hybrid analysis: static & dynamic memory reference analysis
46 A voxel-based parallel collision detection algorithm
42 Markov model prediction of I/O requests for scientific applications
37 Low-complexity reorder buffer architecture
36 A comparative study of modulo scheduling techniques
32 Latency and energy aware value prediction for high-frequency processors
26 Leveraging cache coherence in active memory systems
26 Profile-guided post-link stride prefetching
26 Dual path instruction processing
21 Execution history guided instruction prefetching
19 An interleaved cache clustered VLIW processor
16 A deterministic fault-tolerant and deadlock-free routing protocol in 2-D meshes based on odd-even turn model
15 Computation regrouping: restructuring programs for temporal data cache locality
15 Near-optimal adaptive control of a large grid application
14 Parallelization and performance of 3D ultrasound imaging beamforming algorithms on modern clusters
13 Active buffering plus compressed migration: an integrated solution to parallel simulations' data transport needs
13 Instance-wise points-to analysis for loop-based dependence testing
11 Experiences tuning SMG98: a semicoarsening multigrid benchmark based on the hypre library
10 Compiler supported high-level abstractions for sparse disk-resident datasets
8 Affinity-based cluster assignment for unrolled loops
7 Can the earth simulator change the way humans think?
7 Heterogeneous multi-computer system: a new platform for multi-paradigm scientific simulation
3 Challenges and opportunities in autonomic computing
2 Using predicate path information in hardware to determine true dependences
1 Optimal software pipelining of loops with control flows
0 Boosting trace cache performance with nonhead miss speculation
0 Clustered approaches to HPC via commodity HW + highly evolved SW