SC1110090807 0605040302
PPoPP1110090807 060503
ICS1110090807 0605040302
IPDPS1110090807 0605040302
ISCA1110090807 0605040302
ASPLOS11100908 060402
MICRO1110090807 0605040302
HPCA1110090807 0605040302

268 Optimization of Sparse Matrix-vector Multiplication on Emerging Multicore Platforms
159 Falkon: Fast and Light-weight tasK executiON framework
92 Efficient Operating System Scheduling for Performance-asymmetric Multi-core Architectures
76 Exploring Event Correlation for Failure Prediction in Coalitions of Clusters
60 Implementation and Performance Analysis of Non-blocking Collective Operations for MPI
59 Anatomy of a Cortical Simulator
57 Efficient Gather and Scatter Operations on Graphics Processors
56 Inter-operating Grids through Delegated MatchMaking
52 Virtual Machine Aware Communication Libraries for High Performance Computing
51 Large-scale Maximum Likelihood-based Phylogenetic Analysis on the IBM BlueGene/L
47 Cray XT4: An Early Evaluation for Petascale Scientific Simulation
46 User-friendly and Reliable Grid Computing Based on Imperfect Middleware
43 Bounding Energy Consumption in Large-scale MPI Programs
35 The Cray BlackWidow: A Highly Scalable Vector Multiprocessor
29 GRAPE-DR: 2-Pflops Massively-Parallel Computer with 512-Core, 512-Gflops Processor Chips for Scientific Computing
29 The Ghost in the Machine: Observing the Effects of Kernel Operation on Parallel Application Performance
28 Multi-threading and One-sided Communication in Parallel LU Factorization
28 Multi-level Tiling: M for the Price of One
26 P^nMPI Tools: A Whole Lot Greater than the Sum of Their Parts
26 WRF nature run
25 Automatic Resource Specification Generation for Resource Selection
25 Evaluation of Active Storage Strategies for the Lustre Parallel File System
24 A Genetic Algorithms Approach to Modeling the Performance of Memory-bound Computations
24 Advanced Data Flow Support for Scientific Grid Workflow Applications
23 Extending stability beyond CPU millennium: a micron-scale atomistic simulation of Kelvin-Helmholtz instability
21 Scalable Security for Petascale Parallel File Systems
21 Investigation of Leading HPC I/O Performance using a Scientific-application-derived Benchmark
20 Integrating Parallel File Systems with Object-based Storage Devices
19 High-performance Ethernet-based Communications for Future Multi-core Processors
18 Performance under Failure of High-end Computing
18 Parallel Hierarchical Visualization of Large Time-varying 3D Vector Fields
18 Application Development on Hybrid Systems
16 Optimizing Center Performance through Coordinated Data Staging, Scheduling and Recovery
16 Data Access History Cache and Associated Data Prefetching Mechanisms
16 Age-Based Packet Arbitration in Large-Radix k-ary n-cubes
15 RobuSTore: A Distributed Storage Architecture with Robust and High Performance
15 Data Exploration of Turbulence Simulations using a Database Cluster
14 A Job Scheduling Framework for Large Computing Farms
13 Using MPI File Caching to Improve Parallel Write Performance for Large-scale Scientific Applications
12 DMTracker: Finding Bugs in Large-scale Parallel Programs by Detecting Anomaly in Data Movements
12 Low-Constant Parallel Algorithms for Finite Element Simulations using Linear Octrees
12 Performance and Cost Optimization for Multiple Large-scale Grid Workflow Applications
12 Anomaly Detection and Diagnosis in Grid Environments
10 Noncontiguous Locking Techniques for Parallel File Systems
10 Analyzing the Impact of Supporting Out-of-order Communication on In-order Performance with iWARP
7 Evaluating NIC Hardware Requirements to Achieve High Message Rate PGAS Support on Multi-Core Processors
7 Scaling Performance of Interior-Point Method on Large-Scale Chip Multiprocessor System
7 Performance Adaptive Power-aware Reconfigurable Optical Interconnects for HPC Systems
7 An Adaptive Mesh Refinement Benchmark for Modern Parallel Programming Languages
6 Automatic Software Interference Detection in Parallel Applications
6 Workstation Capacity Tuning using Reinforcement Learning
5 A Case for Low-complexity MP Architectures
5 Evaluating Network Information Models on Resource Efficiency and Application Performance in Lambda-Grids
4 A User-level Secure Grid File System
4 A Preliminary Investigation of a Neocortex Model Implementation on the Cray XD1
4 A 281 Tflops calculation for X-ray protein structure analysis with special-purpose computers MDGRAPE-3
2 First-principles calculations of large-scale semiconductor systems on the earth simulator
1 Variable Latency Caches for Nanoscale Processor