7 | GreenSlot: Scheduling Energy Consumption in Green Datacenters |

7 | Checkpointing strategies for parallel jobs |

7 | Evaluating the Viability of Process Replication Reliability for Exascale Systems |

6 | Enabling and Scaling Biomolecular Simulations of 100 Million Atoms on Petascale Machines with a Multicore-optimized Message-driven Runtime |

6 | Improving Communication Performance in Dense Linear Algebra via Topology Aware Collectives |

5 | Liszt: A Domain Specific Language for Building Portable Mesh-based PDE Solvers |

5 | Parallel Reduction to Condensed Forms for Symmetric Eigenvalue Problems using Aggregated Fine-Grained and Memory-Aware Kernels |

5 | Purlieus: Locality-aware Resource Allocation for MapReduce in a Cloud |

5 | Parallel Breadth-First Search on Distributed Memory Systems |

4 | Topology-aware data movement and staging for I/O acceleration on Blue Gene/P supercomputing systems |

4 | Reducing Electricity Cost Through Virtual Machine Placement in High Performance Computing Clouds |

4 | High-Efficiency Server Design |

4 | Modeling and Tolerating Heterogeneous Failures in Large Parallel Systems |

4 | SciHadoop: Array-based Query Processing in Hadoop |

4 | Scalable Stochastic Optimization of Complex Energy Systems |

3 | CudaDMA: Optimizing GPU Memory Bandwidth via Warp Specialization |

3 | Simplified Parallel Domain Traversal |

3 | Physis: An Implicitly Parallel Programming Model for Stencil Computations on Large-Scale GPU-Accelerated Supercomputers |

3 | Server-Side I/O Coordination for Parallel File Systems |

3 | A `Cool' Load Balancer for Parallel Applications |

3 | Parallel Random Numbers: As Easy as 1, 2, 3 |

3 | Fast Implementation of DGEMM on Fermi GPU |

3 | SCMFS: A File System for Storage Class Memory |

3 | BlobCR: Efficient Checkpoint-Restart for HPC Applications on IaaS Clouds using Virtual Disk Image Snapshots |

3 | Sniper: Exploring the Level of Abstraction for Scalable and Accurate Parallel Multi-Core Simulation |

3 | Copernicus: A New Paradigm for Parallel Adaptive Molecular Dynamics |

2 | Optimizing Symmetric Dense Matrix-Vector Multiplication on GPUs |

2 | Tiled QR factorization algorithms |

2 | Dymaxion: Optimizing Memory Access Patterns for Heterogeneous Systems |

2 | Peta-scale Phase-Field Simulation for Dendritic Solidification on the TSUBAME 2.0 Supercomputer |

2 | The IBM Blue Gene/Q Interconnection Network and Message Unit |

2 | I/O Streaming Evaluation of Batch Queries for Data-Intensive Computational Turbulence |

2 | Parallel Index and Query for Large Scale Data Analysis |

2 | Using the TOP500 to Trace and Project Technology and Architecture Trends |

2 | FTI: high performance Fault Tolerance Interface for hybrid systems |

2 | Scalable fast multipole methods on distributed heterogeneous architectures |

2 | A new Computational Paradigm in Multiscale Simulations: Application to Brain Blood Flow |

2 | System Implications of Memory Reliability in Exascale Computing |

2 | Multithreaded Global Address Space Communication Techniques for Gyrokinetic Fusion Applications on Ultra-Scale Platforms |

2 | Optimizing the Barnes-Hut Algorithm in UPC |

2 | Hardware, Software Co-design for Energy Efficient Seismic Modeling |

1 | GROPHECY: GPU Performance Projection from CPU Code Skeletons |

1 | QoS Support for End Users of I/O-intensive Applications using Shared Storage Systems |

1 | Gyrokinetic Toroidal Simulations on Leading Multi- and Manycore HPC Systems |

1 | Multi-Science Applications with Single Codebase - GAMER - for Massively Parallel Architectures |

1 | Optimized Pre-Copy Live Migration for Memory Intensive Applications |

1 | TRACON: Interference-Aware Scheduling for Data-Intensive Applications in Virtualized Environments |

1 | Flexible Resource Allocation for Reliable Virtual Cluster Computing Systems |

1 | Auto-Scaling to Minimize Cost and Meet Application Deadlines in Cloud Workflows |

1 | Large Scale Debugging of Parallel Tasks with AutomaDeD |

1 | Performance of the Community Earth System Model |

1 | On the Duality of Data-intensive File System Design: Reconciling HDFS and PVFS |

1 | Scalable Implementations of Accurate Excited-state Coupled Cluster Theories: Application of High-level Methods to Porphyrin-based Systems |

0 | Unitary Qubit Lattice Simulations of Multiscale Phenomena in Quantum Turbulence |

0 | An Image Compositing Solution at Scale |

0 | ISABELA-QA: Query-driven Data Analytics over ISABELA-compressed Extreme-Scale Scientific Data |

0 | Virtual I/O caching: dynamic storage cache management for concurrent workloads |

0 | Scalable Hashing for Shared Memory Supercomputers |

0 | An Early Performance Analysis of POWER7-IH HPC Systems |

0 | A Similarity Measure for Time, Frequency, and Dependencies in Large-Scale Workloads |

0 | Efficient Data Race Detection for Distributed Memory Parallel Programs |

0 | MAximum Multicore POwer (MAMPO) - An Automatic Multithreaded Synthetic Power Virus Generation Framework for Multicore Systems |

0 | Hadoop Acceleration Through Network Levitated Merge |

0 | Extracting Ultra-Scale Lattice Boltzmann Performance via Hierarchical and Distributed Auto-Tuning |

0 | Highly Scalable Ab Initio Genomic Motif Identification |

0 | A Distributed Look-up Architecture for Text Mining Applications using MapReduce |

0 | Parallelization Design for Multi-core Platforms in Density Matrix Renormalization Group toward 2-D Quantum Strongly-correlated Systems |

0 | A Scalable Eigensolver for Large Scale-Free Graphs Using 2D Partitioning |

0 | High-Performance Lattice QCD for Multi-core Based Parallel Systems Using a Cache-Friendly Hybrid Threaded-MPI Approach |

0 | Scaling Lattice QCD beyond 100 GPUs |

0 | End-to-End Network QoS via Scheduling of Flexible Resource Reservation Requests |

0 | Large Scale Plane Wave Pseudopotential Density Functional Theory Calculations on GPU Clusters |

0 | Avoiding hot-spots on two-level direct networks |

0 | A Fast Solver for Modeling the Evolution of Virus Populations |