|
|
| 16 | Singlehop Collaborative Feedback Primitives for Threshold Querying in Wireless Sensor Networks |
| 16 | QR Factorization on a Multicore Node Enhanced with Multiple GPU Accelerators |
| 13 | Two-Stage Tridiagonal Reduction for Dense Symmetric Matrices Using Tile Algorithms on Multicore Architectures |
| 12 | PHAST: Hardware-Accelerated Shortest Path Trees |
| 10 | PATUS: A Code Generation and Autotuning Framework for Parallel Iterative Stencil Computations on Modern Microarchitectures |
| 10 | Uncoordinated Checkpointing Without Domino Effect for Send-Deterministic MPI Applications |
| 10 | Multi-GPU MapReduce on GPU Clusters |
| 9 | Challenges of Scaling Algebraic Multigrid Across Modern Multicore Architectures |
| 9 | A Study of Parallel Particle Tracing for Steady-State and Time-Varying Flow Fields |
| 8 | Hauberk: Lightweight Silent Data Corruption Error Detector for GPGPU |
| 8 | Graph Partitioning with Natural Cuts |
| 6 | Communication-Avoiding QR Decomposition for GPUs |
| 6 | CATCH: A Cloud-Based Adaptive Data Transfer Service for HPC |
| 5 | Willow: A Control System for Energy and Thermal Adaptive Computing |
| 5 | Measuring Temporal Lags in Delay-Tolerant Networks |
| 5 | Automatic Library Generation for BLAS3 on GPUs |
| 5 | Power Token Balancing: Adapting CMPs to Power Constraints for Parallel Multithreaded Workloads |
| 5 | Adding a Referee to an Interconnection Network: What Can(not) Be Computed in One Round |
| 5 | An Auto-tuned Method for Solving Large Tridiagonal Systems on the GPU |
| 4 | Iso-Energy-Efficiency: An Approach to Power-Constrained Parallel Computation |
| 4 | Single Node On-Line Simulation of MPI Applications with SMPI |
| 4 | X10 as a Parallel Language for Scientific Computation: Practice and Experience |
| 4 | DryadOpt: Branch-and-Bound on Distributed Data-Parallel Execution Engines |
| 3 | Minimum Cost Resource Allocation for Meeting Job Requirements |
| 3 | Overlapping Computation and Communication for Advection on Hybrid Parallel Computers |
| 3 | On Nonblocking Folded-Clos Networks in Computer Communication Environments |
| 3 | A Lightweight Method for Automated Design of Convergence |
| 3 | Distributed Fine-Grained Access Control in Wireless Sensor Networks |
| 3 | Design of MILC Lattice QCD Application for GPU Clusters |
| 3 | Automated Architecture-Aware Mapping of Streaming Applications Onto GPUs |
| 3 | Computing Strongly Connected Components in Parallel on CUDA |
| 3 | A New Data Layout for Set Intersection on GPUs |
| 3 | Reduced-Bandwidth Multithreaded Algorithms for Sparse Matrix-Vector Multiplication |
| 3 | Using Shared Memory to Accelerate MapReduce on Graphics Processing Units |
| 3 | Co-analysis of RAS Log and Job Log on Blue Gene/P |
| 3 | CheCL: Transparent Checkpointing and Process Migration of OpenCL Applications |
| 3 | Communication Optimizations for Distributed-Memory X10 Programs |
| 3 | A Fast Algorithm for Constructing Inverted Files on Heterogeneous Platforms |
| 2 | Power-Aware Replica Placement and Update Strategies in Tree Networks |
| 2 | Architectural Constraints to Attain 1 Exaflop/s for Three Scientific Application Classes |
| 2 | Snap-Stabilizing Committee Coordination |
| 2 | A Practical Approach for Performance Analysis of Shared-Memory Programs |
| 2 | Moving the Code to the Data - Dynamic Code Deployment Using ActiveSpaces |
| 2 | LACIO: A New Collective I/O Strategy for Parallel I/O Systems |
| 2 | A Quantitative Analysis of OS Noise |
| 2 | GLocks: Efficient Support for Highly-Contended Locks in Many-Core CMPs |
| 2 | Profiling Heterogeneous Multi-GPU Systems to Accelerate Cortically Inspired Learning Algorithms |
| 2 | Model-Driven SIMD Code Generation for a Multi-resolution Tensor Kernel |
| 2 | Tight Analysis of Relaxed Multi-organization Scheduling Algorithms |
| 1 | Exploiting Data Similarity to Reduce Memory Footprints |
| 1 | vFtree - A Fat-Tree Routing Algorithm Using Virtual Lanes to Alleviate Congestion |
| 1 | SC-OA: A Secure and Efficient Scheme for Origin Authentication of Interdomain Routing in Cloud Computing Networks |
| 1 | Completely Distributed Particle Filters for Target Tracking in Sensor Networks |
| 1 | Hardware-Based Job Queue Management for Manycore Architectures and OpenMP Environments |
| 1 | HK-NUCA: Boosting Data Searches in Dynamic Non-Uniform Cache Architectures for Chip Multiprocessors |
| 1 | Variable Granularity Access Tracking Scheme for Improving the Performance of Software Transactional Memory |
| 1 | The Weighted Byzantine Agreement Problem |
| 1 | On Optimal Tree Traversals for Sparse Matrix Factorization |
| 1 | Critical Bubble Scheme: An Efficient Implementation of Globally Aware Network Flow Control |
| 1 | RDMA Capable iWARP over Datagrams |
| 1 | Partitioning Spatially Located Computations Using Rectangles |
| 1 | Reducing Fragmentation on Torus-Connected Supercomputers |
| 1 | Online Adaptive Code Generation and Tuning |
| 1 | A Communication-Avoiding, Hybrid-Parallel, Rank-Revealing Orthogonalization Method |
| 1 | Scheduling Parallel Iterative Applications on Volatile Resources |
| 1 | The Impact of Soft Resource Allocation on n-Tier Application Scalability |
| 1 | Implementation and Performance Evaluation of the HPC Challenge Benchmarks in Coarray Fortran 2.0 |
| 1 | Parallel Metagenomic Sequence Clustering Via Sketching and Maximal Quasi-clique Enumeration on Map-Reduce Clouds |
| 1 | CABdedupe: A Causality-Based Deduplication Performance Booster for Cloud Backup Services |
| 0 | Power and Performance Management in Priority-Type Cluster Computing Systems |
| 0 | VisIO: Enabling Interactive Visualization of Ultra-Scale, Time Series Data via High-Bandwidth Distributed I/O Systems |
| 0 | A Novel Power Management for CMP Systems in Data-Intensive Environment |
| 0 | Characterization of System Services and Their Performance Impact in Multi-core Nodes |
| 0 | Automatic Recognition of Performance Idioms in Scientific Applications |
| 0 | A Study of Speculative Distributed Scheduling on the Cell/B.E. |
| 0 | The Evaluation of an Effective Out-of-Core Run-Time System in the Context of Parallel Mesh Generation |
| 0 | Enriching 3-D Video Games on Multicores |
| 0 | Redesign of Higher-Level Matrix Algorithms for Multicore and Distributed Architectures and Applications in Quantum Monte Carlo Simulation |
| 0 | A Performance and Area Efficient Architecture for Intrusion Detection Systems |
| 0 | Time-Ordered Event Traces: A New Debugging Primitive for Concurrency Bugs |
| 0 | Connectivity Trade-offs in 3D Wireless Sensor Networks Using Directional Antennae |
| 0 | Multifrontal Factorization of Sparse SPD Matrices on GPUs |
| 0 | Large-Scale Semantic Concept Detection on Manycore Platforms for Multimedia Mining |
| 0 | Efficient GPU Implementation for Particle in Cell Algorithm |
| 0 | A Very Fast Simulator for Exploring the Many-Core Future |
| 0 | Automatic Loop Tiling for Direct Memory Access |
| 0 | Tolerant Value Speculation in Coarse-Grain Streaming Computations |
| 0 | Improved Algorithms for the Distributed Trigger Counting Problem |
| 0 | Leveraging Social Networks to Combat Collusion in Reputation Systems for Peer-to-Peer Networks |
| 0 | Fast Community Detection Algorithm with GPUs and Multicore Architectures |
| 0 | A Scalable Reverse Lookup Scheme Using Group-Based Shifted Declustering Layout |
| 0 | Deadlock-Free Oblivious Routing for Arbitrary Topologies |
| 0 | Reconciling Sampling and Direct Instrumentation for Unintrusive Call-Path Profiling of MPI Programs |
| 0 | Optimizing Large-Scale Graph Analysis on a Multi-threaded, Multi-core Platform |
| 0 | GRAL: A Grouping Algorithm to Optimize Application Placement in Wireless Embedded Systems |
| 0 | Vitis: A Gossip-based Hybrid Overlay for Internet-scale Publish/Subscribe Enabling Rendezvous Routing in Unstructured Overlay Networks |
| 0 | High Performance Scalable and Expressive Modeling Environment to Study Mobile Malware in Large Dynamic Networks |
| 0 | H-Code: A Hybrid MDS Array Code to Optimize Partial Stripe Writes in RAID-6 |
| 0 | Unified Signatures for Improving Performance in Transactional Memory |
| 0 | Flease - Lease Coordination Without a Lock Server |
| 0 | Minimal Obstructions for the Coordinated Attack Problem and Beyond |
| 0 | Shared Resource Monitoring and Throughput Optimization in Cloud-Computing Datacenters |
| 0 | Profiling Directed NUMA Optimization on Linux Systems: A Case Study of the Gaussian Computational Chemistry Code |
| 0 | I/O-Optimal Distribution Sweeping on Private-Cache Chip Multiprocessors |
| 0 | Reader Activation Scheduling in Multi-reader RFID Systems: A Study of General Case |
| 0 | Efficient Parallel Scheduling of Malleable Tasks |
| 0 | Offline Scheduling of Multi-threaded Request Streams on a Caching Server |
| 0 | Scheduling Functionally Heterogeneous Systems with Utilization Balancing |
| 0 | Smith-Waterman Alignment of Huge Sequences with GPU in Linear Space |
| 0 | Accelerating Protein Sequence Search in a Heterogeneous Computing System |
| 0 | Large-Scale Lattice Gas Monte Carlo Simulations for the Generalized Ising Model |
| 0 | A Scalable and Elastic Publish/Subscribe Service |