SC1110090807 0605040302
PPoPP1110090807 060503
ICS1110090807 0605040302
IPDPS1110090807 0605040302
ISCA1110090807 0605040302
ASPLOS11100908 060402
MICRO1110090807 0605040302
HPCA1110090807 0605040302

16 Singlehop Collaborative Feedback Primitives for Threshold Querying in Wireless Sensor Networks
16 QR Factorization on a Multicore Node Enhanced with Multiple GPU Accelerators
13 Two-Stage Tridiagonal Reduction for Dense Symmetric Matrices Using Tile Algorithms on Multicore Architectures
12 PHAST: Hardware-Accelerated Shortest Path Trees
10 PATUS: A Code Generation and Autotuning Framework for Parallel Iterative Stencil Computations on Modern Microarchitectures
10 Uncoordinated Checkpointing Without Domino Effect for Send-Deterministic MPI Applications
10 Multi-GPU MapReduce on GPU Clusters
9 Challenges of Scaling Algebraic Multigrid Across Modern Multicore Architectures
9 A Study of Parallel Particle Tracing for Steady-State and Time-Varying Flow Fields
8 Hauberk: Lightweight Silent Data Corruption Error Detector for GPGPU
8 Graph Partitioning with Natural Cuts
6 Communication-Avoiding QR Decomposition for GPUs
6 CATCH: A Cloud-Based Adaptive Data Transfer Service for HPC
5 Willow: A Control System for Energy and Thermal Adaptive Computing
5 Measuring Temporal Lags in Delay-Tolerant Networks
5 Automatic Library Generation for BLAS3 on GPUs
5 Power Token Balancing: Adapting CMPs to Power Constraints for Parallel Multithreaded Workloads
5 Adding a Referee to an Interconnection Network: What Can(not) Be Computed in One Round
5 An Auto-tuned Method for Solving Large Tridiagonal Systems on the GPU
4 Iso-Energy-Efficiency: An Approach to Power-Constrained Parallel Computation
4 Single Node On-Line Simulation of MPI Applications with SMPI
4 X10 as a Parallel Language for Scientific Computation: Practice and Experience
4 DryadOpt: Branch-and-Bound on Distributed Data-Parallel Execution Engines
3 Minimum Cost Resource Allocation for Meeting Job Requirements
3 Overlapping Computation and Communication for Advection on Hybrid Parallel Computers
3 On Nonblocking Folded-Clos Networks in Computer Communication Environments
3 A Lightweight Method for Automated Design of Convergence
3 Distributed Fine-Grained Access Control in Wireless Sensor Networks
3 Design of MILC Lattice QCD Application for GPU Clusters
3 Automated Architecture-Aware Mapping of Streaming Applications Onto GPUs
3 Computing Strongly Connected Components in Parallel on CUDA
3 A New Data Layout for Set Intersection on GPUs
3 Reduced-Bandwidth Multithreaded Algorithms for Sparse Matrix-Vector Multiplication
3 Using Shared Memory to Accelerate MapReduce on Graphics Processing Units
3 Co-analysis of RAS Log and Job Log on Blue Gene/P
3 CheCL: Transparent Checkpointing and Process Migration of OpenCL Applications
3 Communication Optimizations for Distributed-Memory X10 Programs
3 A Fast Algorithm for Constructing Inverted Files on Heterogeneous Platforms
2 Power-Aware Replica Placement and Update Strategies in Tree Networks
2 Architectural Constraints to Attain 1 Exaflop/s for Three Scientific Application Classes
2 Snap-Stabilizing Committee Coordination
2 A Practical Approach for Performance Analysis of Shared-Memory Programs
2 Moving the Code to the Data - Dynamic Code Deployment Using ActiveSpaces
2 LACIO: A New Collective I/O Strategy for Parallel I/O Systems
2 A Quantitative Analysis of OS Noise
2 GLocks: Efficient Support for Highly-Contended Locks in Many-Core CMPs
2 Profiling Heterogeneous Multi-GPU Systems to Accelerate Cortically Inspired Learning Algorithms
2 Model-Driven SIMD Code Generation for a Multi-resolution Tensor Kernel
2 Tight Analysis of Relaxed Multi-organization Scheduling Algorithms
1 Exploiting Data Similarity to Reduce Memory Footprints
1 vFtree - A Fat-Tree Routing Algorithm Using Virtual Lanes to Alleviate Congestion
1 SC-OA: A Secure and Efficient Scheme for Origin Authentication of Interdomain Routing in Cloud Computing Networks
1 Completely Distributed Particle Filters for Target Tracking in Sensor Networks
1 Hardware-Based Job Queue Management for Manycore Architectures and OpenMP Environments
1 HK-NUCA: Boosting Data Searches in Dynamic Non-Uniform Cache Architectures for Chip Multiprocessors
1 Variable Granularity Access Tracking Scheme for Improving the Performance of Software Transactional Memory
1 The Weighted Byzantine Agreement Problem
1 On Optimal Tree Traversals for Sparse Matrix Factorization
1 Critical Bubble Scheme: An Efficient Implementation of Globally Aware Network Flow Control
1 RDMA Capable iWARP over Datagrams
1 Partitioning Spatially Located Computations Using Rectangles
1 Reducing Fragmentation on Torus-Connected Supercomputers
1 Online Adaptive Code Generation and Tuning
1 A Communication-Avoiding, Hybrid-Parallel, Rank-Revealing Orthogonalization Method
1 Scheduling Parallel Iterative Applications on Volatile Resources
1 The Impact of Soft Resource Allocation on n-Tier Application Scalability
1 Implementation and Performance Evaluation of the HPC Challenge Benchmarks in Coarray Fortran 2.0
1 Parallel Metagenomic Sequence Clustering Via Sketching and Maximal Quasi-clique Enumeration on Map-Reduce Clouds
1 CABdedupe: A Causality-Based Deduplication Performance Booster for Cloud Backup Services
0 Power and Performance Management in Priority-Type Cluster Computing Systems
0 VisIO: Enabling Interactive Visualization of Ultra-Scale, Time Series Data via High-Bandwidth Distributed I/O Systems
0 A Novel Power Management for CMP Systems in Data-Intensive Environment
0 Characterization of System Services and Their Performance Impact in Multi-core Nodes
0 Automatic Recognition of Performance Idioms in Scientific Applications
0 A Study of Speculative Distributed Scheduling on the Cell/B.E.
0 The Evaluation of an Effective Out-of-Core Run-Time System in the Context of Parallel Mesh Generation
0 Enriching 3-D Video Games on Multicores
0 Redesign of Higher-Level Matrix Algorithms for Multicore and Distributed Architectures and Applications in Quantum Monte Carlo Simulation
0 A Performance and Area Efficient Architecture for Intrusion Detection Systems
0 Time-Ordered Event Traces: A New Debugging Primitive for Concurrency Bugs
0 Connectivity Trade-offs in 3D Wireless Sensor Networks Using Directional Antennae
0 Multifrontal Factorization of Sparse SPD Matrices on GPUs
0 Large-Scale Semantic Concept Detection on Manycore Platforms for Multimedia Mining
0 Efficient GPU Implementation for Particle in Cell Algorithm
0 A Very Fast Simulator for Exploring the Many-Core Future
0 Automatic Loop Tiling for Direct Memory Access
0 Tolerant Value Speculation in Coarse-Grain Streaming Computations
0 Improved Algorithms for the Distributed Trigger Counting Problem
0 Leveraging Social Networks to Combat Collusion in Reputation Systems for Peer-to-Peer Networks
0 Fast Community Detection Algorithm with GPUs and Multicore Architectures
0 A Scalable Reverse Lookup Scheme Using Group-Based Shifted Declustering Layout
0 Deadlock-Free Oblivious Routing for Arbitrary Topologies
0 Reconciling Sampling and Direct Instrumentation for Unintrusive Call-Path Profiling of MPI Programs
0 Optimizing Large-Scale Graph Analysis on a Multi-threaded, Multi-core Platform
0 GRAL: A Grouping Algorithm to Optimize Application Placement in Wireless Embedded Systems
0 Vitis: A Gossip-based Hybrid Overlay for Internet-scale Publish/Subscribe Enabling Rendezvous Routing in Unstructured Overlay Networks
0 High Performance Scalable and Expressive Modeling Environment to Study Mobile Malware in Large Dynamic Networks
0 H-Code: A Hybrid MDS Array Code to Optimize Partial Stripe Writes in RAID-6
0 Unified Signatures for Improving Performance in Transactional Memory
0 Flease - Lease Coordination Without a Lock Server
0 Minimal Obstructions for the Coordinated Attack Problem and Beyond
0 Shared Resource Monitoring and Throughput Optimization in Cloud-Computing Datacenters
0 Profiling Directed NUMA Optimization on Linux Systems: A Case Study of the Gaussian Computational Chemistry Code
0 I/O-Optimal Distribution Sweeping on Private-Cache Chip Multiprocessors
0 Reader Activation Scheduling in Multi-reader RFID Systems: A Study of General Case
0 Efficient Parallel Scheduling of Malleable Tasks
0 Offline Scheduling of Multi-threaded Request Streams on a Caching Server
0 Scheduling Functionally Heterogeneous Systems with Utilization Balancing
0 Smith-Waterman Alignment of Huge Sequences with GPU in Linear Space
0 Accelerating Protein Sequence Search in a Heterogeneous Computing System
0 Large-Scale Lattice Gas Monte Carlo Simulations for the Generalized Ising Model
0 A Scalable and Elastic Publish/Subscribe Service