

140  Implementing Sparse MatrixVector Multiplication on ThroughputOriented Processors 
75  PLFS: A Checkpoint Filesystem for Parallel Applications 
48  Dynamic Task Scheduling for Linear Algebra Algorithms on DistributedMemory Multicore Systems 
44  MillisecondScale Molecular Dynamics Simulations on Anton 
44  I/O Performance Challenges at Leadership Scale 
43  The Cat is Out of the Bag: Cortical Simulations with 10^9 Neurons, 10^13 Synapses 
43  AutoTuning 3D FFT Library for CUDA GPUs 
42  Scalable Work Stealing 
40  Comparative Study of OneSided Factorizations with Multiple Software Packages on MultiCore Hardware 
36  Leveraging 3D PCRAM Technologies to Reduce Checkpoint Overhead for Future Exascale Systems 
34  42 TFlops Hierarchical Nbody Simulations on GPUs with Applications in both Astrophysics and Turbulence 
33  Minimizing Communication in Sparse Matrix Solvers 
30  VGrADS: Enabling eScience Workflows on Grids and Clouds with Fault Tolerance 
25  HyperX: Topology, Routing, and Packaging of Efficient LargeScale Networks 
24  A Massively Parallel Adaptive FastMultipole Method on Heterogeneous Architectures 
22  Scalable Implicit Finite Element Solver for Massively Parallel Processing with Demonstration to 160K cores 
20  Towards a Framework for Abstracting Accelerators in Parallel Applications: Experience with Cell 
19  Diagnosing Performance Bottlenecks in Emerging Petascale Applications 
19  Future Scaling of ProcessorMemory Interfaces 
19  GridBot: Execution of Bags of Tasks in Multiple Grids 
19  Scalable Massively Parallel I/O to TaskLocal Files 
18  Increasing Memory Miss Tolerance for SIMD Cores 
18  Liquid Water: Obtaining the Right Answer for the Right Reasons 
15  PFunc: Modern Task Parallelism for Modern High Performance Computing 
15  SmartStore: A New Metadata Organization Paradigm with SemanticAwareness for NextGeneration File Systems 
14  MemoryEfficient Optimization of Gyrokinetic ParticletoGrid Interpolation for Multicore Processors 
14  Scalable Computation of Streamlines on Very Large Datasets 
13  Terascale Data Organization for Discovering Multivariate Climatic Trends 
12  Sparse Matrix Factorization on Massively Parallel Computers 
12  A Configurable Algorithm for Parallel ImageCompositing Applications 
12  Autotuning Multigrid with PetaBricks 
11  InstructionLevel Simulation of a Cluster at Scale 
11  Age Based Scheduling for Asymmetric Multiprocessors 
10  Automating the Generation of Composed Linear Algebra Kernels 
10  Router Designs for Elastic Buffer OnChip Networks 
10  Allocator Implementations for NetworkonChip Routers 
10  Improving GridFTP Performance Using The Phoebus Session Layer 
10  Multicore Acceleration of Chemical Kinetics for Simulation and Prediction 
9  Adaptive and Scalable Metadata Management to Support A Trillion Files 
9  On the Design of Scalable, SelfConfiguring Virtual Networks 
8  Optimal Real Number Codes for Fault Tolerant Matrix Operations 
8  FACT: Fast Communication Trace Collection for Parallel Applications through Program Slicing 
8  SpaceEfficient TimeSeries CallPath Profiling of Parallel Applications 
8  Early Performance Evaluation of "Nehalem" Cluster using Scientific and Engineering Applications 
7  A Case for Integrated ProcessorCache Partitioning in Chip Multiprocessors 
7  Enabling Software Management for Multicore Caches with a Lightweight Hardware Support 
7  Predicting the Execution Time of Grid Workflow Applications through Local Learning 
7  Indexing Genomic Sequences on the IBM Blue Gene 
7  Evaluating SimilarityBased Trace Reduction Techniques for Scalable Performance Analysis 
7  Triangular Matrix Inversion on Graphics Processing Units 
7  Scalable Temporal Order Analysis for Large Scale Debugging 
6  Machine LearningBased Prefetch Optimization for Data Center Applications 
6  A Design Methodology for DomainOptimized PowerEfficient Supercomputing 
6  A 32x32x32, Spatially Distributed 3D FFT in Four Microseconds on Anton 
6  Performance Evaluation of NEC SX9 using Real Science and Engineering Applications 
6  Enabling HighFidelity Neutron Transport Simulations on Petascale Architectures 
5  Compact MultiDimensional Kernel Extraction for Register Tiling 
5  A Microdriver Architecture for Error Correcting Codes inside the Linux Kernel 
5  Beyond Homogeneous Decomposition: Scaling LongRange Forces on Massively Parallel Architectures 
4  Evaluating the Impact of Inaccurate Information in UtilityBased Scheduling 
4  FALCON: A System for Reliable Checkpoint Recovery in Shared Grid Environments 
4  Flexible Cache Error Protection using an ECC FIFO 
3  Dynamic Storage Cache Allocation in MultiServer Architectures 
3  Supporting FaultTolerance for TimeCritical Events in Distributed Environments 
2  SCAMPI: A Scalable Cambased Algorithm for Multiple Pattern Inspection 
2  Efficient Band Approximation of Gram Matrices for Large Scale Kernel Methods on GPUs 
2  A Scalable Method for Ab Initio Computation of Free Energies in Nanoscale Systems 