

16  Singlehop Collaborative Feedback Primitives for Threshold Querying in Wireless Sensor Networks 
16  QR Factorization on a Multicore Node Enhanced with Multiple GPU Accelerators 
13  TwoStage Tridiagonal Reduction for Dense Symmetric Matrices Using Tile Algorithms on Multicore Architectures 
12  PHAST: HardwareAccelerated Shortest Path Trees 
10  PATUS: A Code Generation and Autotuning Framework for Parallel Iterative Stencil Computations on Modern Microarchitectures 
10  Uncoordinated Checkpointing Without Domino Effect for SendDeterministic MPI Applications 
10  MultiGPU MapReduce on GPU Clusters 
9  Challenges of Scaling Algebraic Multigrid Across Modern Multicore Architectures 
9  A Study of Parallel Particle Tracing for SteadyState and TimeVarying Flow Fields 
8  Hauberk: Lightweight Silent Data Corruption Error Detector for GPGPU 
8  Graph Partitioning with Natural Cuts 
6  CommunicationAvoiding QR Decomposition for GPUs 
6  CATCH: A CloudBased Adaptive Data Transfer Service for HPC 
5  Willow: A Control System for Energy and Thermal Adaptive Computing 
5  Measuring Temporal Lags in DelayTolerant Networks 
5  Automatic Library Generation for BLAS3 on GPUs 
5  Power Token Balancing: Adapting CMPs to Power Constraints for Parallel Multithreaded Workloads 
5  Adding a Referee to an Interconnection Network: What Can(not) Be Computed in One Round 
5  An Autotuned Method for Solving Large Tridiagonal Systems on the GPU 
4  IsoEnergyEfficiency: An Approach to PowerConstrained Parallel Computation 
4  Single Node OnLine Simulation of MPI Applications with SMPI 
4  X10 as a Parallel Language for Scientific Computation: Practice and Experience 
4  DryadOpt: BranchandBound on Distributed DataParallel Execution Engines 
3  Minimum Cost Resource Allocation for Meeting Job Requirements 
3  Overlapping Computation and Communication for Advection on Hybrid Parallel Computers 
3  On Nonblocking FoldedClos Networks in Computer Communication Environments 
3  A Lightweight Method for Automated Design of Convergence 
3  Distributed FineGrained Access Control in Wireless Sensor Networks 
3  Design of MILC Lattice QCD Application for GPU Clusters 
3  Automated ArchitectureAware Mapping of Streaming Applications Onto GPUs 
3  Computing Strongly Connected Components in Parallel on CUDA 
3  A New Data Layout for Set Intersection on GPUs 
3  ReducedBandwidth Multithreaded Algorithms for Sparse MatrixVector Multiplication 
3  Using Shared Memory to Accelerate MapReduce on Graphics Processing Units 
3  Coanalysis of RAS Log and Job Log on Blue Gene/P 
3  CheCL: Transparent Checkpointing and Process Migration of OpenCL Applications 
3  Communication Optimizations for DistributedMemory X10 Programs 
3  A Fast Algorithm for Constructing Inverted Files on Heterogeneous Platforms 
2  PowerAware Replica Placement and Update Strategies in Tree Networks 
2  Architectural Constraints to Attain 1 Exaflop/s for Three Scientific Application Classes 
2  SnapStabilizing Committee Coordination 
2  A Practical Approach for Performance Analysis of SharedMemory Programs 
2  Moving the Code to the Data  Dynamic Code Deployment Using ActiveSpaces 
2  LACIO: A New Collective I/O Strategy for Parallel I/O Systems 
2  A Quantitative Analysis of OS Noise 
2  GLocks: Efficient Support for HighlyContended Locks in ManyCore CMPs 
2  Profiling Heterogeneous MultiGPU Systems to Accelerate Cortically Inspired Learning Algorithms 
2  ModelDriven SIMD Code Generation for a Multiresolution Tensor Kernel 
2  Tight Analysis of Relaxed Multiorganization Scheduling Algorithms 
1  Exploiting Data Similarity to Reduce Memory Footprints 
1  vFtree  A FatTree Routing Algorithm Using Virtual Lanes to Alleviate Congestion 
1  SCOA: A Secure and Efficient Scheme for Origin Authentication of Interdomain Routing in Cloud Computing Networks 
1  Completely Distributed Particle Filters for Target Tracking in Sensor Networks 
1  HardwareBased Job Queue Management for Manycore Architectures and OpenMP Environments 
1  HKNUCA: Boosting Data Searches in Dynamic NonUniform Cache Architectures for Chip Multiprocessors 
1  Variable Granularity Access Tracking Scheme for Improving the Performance of Software Transactional Memory 
1  The Weighted Byzantine Agreement Problem 
1  On Optimal Tree Traversals for Sparse Matrix Factorization 
1  Critical Bubble Scheme: An Efficient Implementation of Globally Aware Network Flow Control 
1  RDMA Capable iWARP over Datagrams 
1  Partitioning Spatially Located Computations Using Rectangles 
1  Reducing Fragmentation on TorusConnected Supercomputers 
1  Online Adaptive Code Generation and Tuning 
1  A CommunicationAvoiding, HybridParallel, RankRevealing Orthogonalization Method 
1  Scheduling Parallel Iterative Applications on Volatile Resources 
1  The Impact of Soft Resource Allocation on nTier Application Scalability 
1  Implementation and Performance Evaluation of the HPC Challenge Benchmarks in Coarray Fortran 2.0 
1  Parallel Metagenomic Sequence Clustering Via Sketching and Maximal Quasiclique Enumeration on MapReduce Clouds 
1  CABdedupe: A CausalityBased Deduplication Performance Booster for Cloud Backup Services 
0  Power and Performance Management in PriorityType Cluster Computing Systems 
0  VisIO: Enabling Interactive Visualization of UltraScale, Time Series Data via HighBandwidth Distributed I/O Systems 
0  A Novel Power Management for CMP Systems in DataIntensive Environment 
0  Characterization of System Services and Their Performance Impact in Multicore Nodes 
0  Automatic Recognition of Performance Idioms in Scientific Applications 
0  A Study of Speculative Distributed Scheduling on the Cell/B.E. 
0  The Evaluation of an Effective OutofCore RunTime System in the Context of Parallel Mesh Generation 
0  Enriching 3D Video Games on Multicores 
0  Redesign of HigherLevel Matrix Algorithms for Multicore and Distributed Architectures and Applications in Quantum Monte Carlo Simulation 
0  A Performance and Area Efficient Architecture for Intrusion Detection Systems 
0  TimeOrdered Event Traces: A New Debugging Primitive for Concurrency Bugs 
0  Connectivity Tradeoffs in 3D Wireless Sensor Networks Using Directional Antennae 
0  Multifrontal Factorization of Sparse SPD Matrices on GPUs 
0  LargeScale Semantic Concept Detection on Manycore Platforms for Multimedia Mining 
0  Efficient GPU Implementation for Particle in Cell Algorithm 
0  A Very Fast Simulator for Exploring the ManyCore Future 
0  Automatic Loop Tiling for Direct Memory Access 
0  Tolerant Value Speculation in CoarseGrain Streaming Computations 
0  Improved Algorithms for the Distributed Trigger Counting Problem 
0  Leveraging Social Networks to Combat Collusion in Reputation Systems for PeertoPeer Networks 
0  Fast Community Detection Algorithm with GPUs and Multicore Architectures 
0  A Scalable Reverse Lookup Scheme Using GroupBased Shifted Declustering Layout 
0  DeadlockFree Oblivious Routing for Arbitrary Topologies 
0  Reconciling Sampling and Direct Instrumentation for Unintrusive CallPath Profiling of MPI Programs 
0  Optimizing LargeScale Graph Analysis on a Multithreaded, Multicore Platform 
0  GRAL: A Grouping Algorithm to Optimize Application Placement in Wireless Embedded Systems 
0  Vitis: A Gossipbased Hybrid Overlay for Internetscale Publish/Subscribe Enabling Rendezvous Routing in Unstructured Overlay Networks 
0  High Performance Scalable and Expressive Modeling Environment to Study Mobile Malware in Large Dynamic Networks 
0  HCode: A Hybrid MDS Array Code to Optimize Partial Stripe Writes in RAID6 
0  Unified Signatures for Improving Performance in Transactional Memory 
0  Flease  Lease Coordination Without a Lock Server 
0  Minimal Obstructions for the Coordinated Attack Problem and Beyond 
0  Shared Resource Monitoring and Throughput Optimization in CloudComputing Datacenters 
0  Profiling Directed NUMA Optimization on Linux Systems: A Case Study of the Gaussian Computational Chemistry Code 
0  I/OOptimal Distribution Sweeping on PrivateCache Chip Multiprocessors 
0  Reader Activation Scheduling in Multireader RFID Systems: A Study of General Case 
0  Efficient Parallel Scheduling of Malleable Tasks 
0  Offline Scheduling of Multithreaded Request Streams on a Caching Server 
0  Scheduling Functionally Heterogeneous Systems with Utilization Balancing 
0  SmithWaterman Alignment of Huge Sequences with GPU in Linear Space 
0  Accelerating Protein Sequence Search in a Heterogeneous Computing System 
0  LargeScale Lattice Gas Monte Carlo Simulations for the Generalized Ising Model 
0  A Scalable and Elastic Publish/Subscribe Service 