|
|
| 37 | An auto-tuning framework for parallel multicore stencil computations |
| 34 | Inter-block GPU communication via fast barrier synchronization |
| 33 | eScience in the cloud: A MODIS satellite data reprojection and reduction pipeline in the Windows Azure platform |
| 32 | Palacios and Kitten: New high performance operating systems for scalable virtualized and native supercomputing |
| 31 | GPU sample sort |
| 25 | PreDatA - preparatory data analytics on peta-scale machines |
| 25 | Consistency in hindsight: A fully decentralized STM algorithm |
| 22 | Oblivious algorithms for multicores and network of processors |
| 21 | BlobSeer: Bringing high throughput under heavy concurrency to Hadoop Map-Reduce applications |
| 21 | SLAW: A scalable locality-aware adaptive work-stealing scheduler |
| 19 | A cost-effective strategy for intermediate data storage in scientific cloud workflow systems |
| 16 | DEBAR: A scalable high-performance de-duplication storage system for backup and archiving |
| 16 | High performance comparison-based sorting algorithm on many-core GPUs |
| 16 | Optimizing and tuning the fast multipole method for state-of-the-art multicore architectures |
| 16 | Dynamic load balancing on single- and multi-GPU systems |
| 14 | A lock-free, cache-efficient multi-core synchronization mechanism for line-rate network traffic monitoring |
| 14 | Performance evaluation of concurrent collections on high-performance multicore computing systems |
| 13 | Engineering a scalable high quality graph partitioner |
| 13 | QR factorization of tall and skinny matrices in a grid computing environment |
| 13 | Dynamic analysis of the relay cache-coherence protocol for distributed transactional memory |
| 12 | Tile QR factorization with parallel panel processing for multicore architectures |
| 12 | Structuring the execution of OpenMP applications for multicore architectures |
| 11 | Evaluating standard-based self-virtualizing devices: A performance study on 10 GbE NICs with SR-IOV support |
| 11 | Improving the performance of Uintah: A large-scale adaptive meshing computational framework |
| 11 | Speculative execution on multi-GPU systems |
| 11 | Parallel I/O performance: From events to ensembles |
| 11 | Highly scalable parallel sorting |
| 11 | Executing task graphs using work-stealing |
| 10 | Improving the performance of hypervisor-based fault tolerance |
| 10 | Exploiting inter-thread temporal locality for chip multithreading |
| 10 | Hybrid MPI/OpenMP power-aware computing |
| 10 | Clustering JVMs with software transactional memory support |
| 10 | MMT: Exploiting fine-grained parallelism in dynamic memory management |
| 9 | A high-performance fault-tolerant software framework for memory on commodity GPUs |
| 9 | Algorithmic Cholesky factorization fault recovery |
| 9 | Oversubscription on multicore processors |
| 9 | Optimal loop unrolling for GPGPU programs |
| 8 | Servet: A benchmark suite for autotuning on multicore clusters |
| 8 | Dynamic fractional resource scheduling for HPC workloads |
| 8 | Adapting cache partitioning algorithms to pseudo-LRU replacement policies |
| 8 | Performance and energy optimization of concurrent pipelined applications |
| 8 | Analyzing and adjusting user runtime estimates to improve job scheduling on the Blue Gene/P |
| 8 | First experiences with congestion control in InfiniBand hardware |
| 8 | Implementing the Himeno benchmark with CUDA on GPU clusters |
| 7 | Adapting communication-avoiding LU and QR factorizations to multicore architectures |
| 7 | Identifying ad-hoc synchronization for enhanced race detection |
| 7 | DynTile: Parametric tiled loop generation for parallel execution on multicore processors |
| 7 | KRASH: Reproducible CPU load generation on many-core machines |
| 7 | Parallel external memory graph algorithms |
| 7 | Large-scale multi-dimensional document clustering on GPU clusters |
| 7 | An introductory exascale feasibility study for FFTs and multigrid |
| 7 | Parallel de novo assembly of large genomes from high-throughput short reads |
| 6 | Varying bandwidth resource allocation problem with bag constraints |
| 6 | Offline library adaptation using automatically generated heuristics |
| 6 | Locality-aware adaptive grain signatures for Transactional Memories |
| 6 | Power-aware MPI task aggregation prediction for high-end computing systems |
| 6 | Optimization of linked list prefix computations on multithreaded GPUs using CUDA |
| 6 | Overlays with preferences: Approximation algorithms for matching with preference lists |
| 6 | Decentralized resource management for multi-core desktop grids |
| 6 | Algorithmic mechanisms for internet-based master-worker computing with untrusted and selfish workers |
| 6 | A novel application of parallel betweenness centrality to power grid contingency analysis |
| 5 | Toward understanding heterogeneity in computing |
| 5 | HPDA: A hybrid parity-based disk array for enhanced performance and reliability |
| 5 | On-line detection of large-scale parallel application's structure |
| 5 | Optimization of applications with non-blocking neighborhood collectives via multisends on the Blue Gene/P supercomputer |
| 5 | Dynamically tuned push-relabel algorithm for the maximum flow problem on CPU-GPU-Hybrid platforms |
| 5 | Parallelization of tau-leap coarse-grained Monte Carlo simulations on GPUs |
| 5 | Extreme scale computing: Modeling the impact of system noise in multicore clustered systems |
| 5 | Runtime checking of serializability in software transactional memory |
| 4 | A hybrid Interest Management mechanism for peer-to-peer Networked Virtual Environments |
| 4 | Linpack evaluation on a supercomputer with heterogeneous accelerators |
| 4 | Scheduling algorithms for linear workflow optimization |
| 4 | A scheduling framework for large-scale, parallel, and topology-aware applications |
| 4 | Exploiting the forgiving nature of applications for scalable parallel execution |
| 4 | Supporting fault tolerance in a data-intensive computing middleware |
| 4 | Load regulating algorithm for static-priority task scheduling on multiprocessors |
| 4 | Exploiting set-level non-uniformity of capacity demand to enhance CMP cooperative caching |
| 4 | A dynamic approach for characterizing collusion in desktop grids |
| 4 | Hierarchical phasers for scalable synchronization and reductions in dynamic parallelism |
| 4 | QoS assessment of WS-BPEL processes through non-Markovian stochastic Petri nets |
| 3 | A general algorithm for detecting faults under the comparison diagnosis model |
| 3 | On the importance of bandwidth control mechanisms for scheduling on large scale heterogeneous platforms |
| 3 | Broadcasting on large scale heterogeneous platforms under the bounded multi-port model |
| 3 | Parallel computation of best connections in public transportation networks |
| 3 | Distributive waveband assignment in multi-granular optical networks |
| 3 | Hypergraph-based task-bundle scheduling towards efficiency and fairness in heterogeneous distributed systems |
| 3 | Using the middle tier to understand cross-tier delay in a multi-tier application |
| 3 | Masking I/O latency using application level I/O caching and prefetching on Blue Gene systems |
| 3 | Analyzing the soft error resilience of linear solvers on multicore multiprocessors |
| 3 | Service and resource discovery in cycle-sharing environments with a utility algebra |
| 2 | Distributed advance network reservation with delay guarantees |
| 2 | Improving numerical reproducibility and stability in large-scale numerical simulations on GPUs |
| 2 | A parallel architecture for meaning comparison |
| 2 | Reconciling scratch space consumption, exposure, and volatility to achieve timely staging of job input data |
| 2 | Analysis of durability in replicated distributed storage systems |
| 2 | A local, distributed constant-factor approximation algorithm for the dynamic facility location problem |
| 2 | Robust control-theoretic thermal balancing for server clusters |
| 2 | Object-oriented stream programming using aspects |
| 2 | Midpoint routing algorithms for Delaunay triangulations |
| 2 | Adaptive sampling-based profiling techniques for optimizing the distributed JVM runtime |
| 2 | Fine-grained QoS scheduling for PCM-based main memory systems |
| 2 | A simple thermal model for multi-core processors and its application to slack allocation |
| 2 | Intra-application cache partitioning |
| 2 | A scalable algorithm for maintaining perpetual system connectivity in dynamic distributed systems |
| 2 | Improving the performance of program monitors with compiler support in multi-core environment |
| 2 | Performance impact of resource contention in multicore systems |
| 2 | Power-aware resource provisioning in cluster computing |
| 1 | Sparse power-efficient topologies for wireless ad hoc sensor networks |
| 1 | GenerOS: An asymmetric operating system kernel for multi-core systems |
| 1 | A multi-source label-correcting algorithm for the all-pairs shortest paths problem |
| 1 | Balls into non-uniform bins |
| 1 | Direct self-consistent field computations on GPU clusters |
| 1 | Profitability-based power allocation for speculative multithreaded systems |
| 1 | Using focused regression for accurate time-constrained scaling of scientific applications |
| 1 | ADEPT scalability predictor in support of adaptive resource allocation |
| 1 | Contention-based georouting with guaranteed delivery, minimal communication overhead, and shorter paths in wireless sensor networks |
| 0 | Scalable multi-pipeline architecture for high performance multi-pattern string matching |
| 0 | QoS aware BiNoC architecture |
| 0 | Fisheye lens distortion correction on multicore and hardware accelerator platforms |
| 0 | A low cost split-issue technique to improve performance of SMT clustered VLIW processors |
| 0 | Attack-resistant frequency counting |
| 0 | Parallelization of DQMC simulation for strongly correlated electron systems |
| 0 | Achieve constant performance guarantees using asynchronous crossbar scheduling without speedup |
| 0 | Scalable failure recovery for high-performance data aggregation |
| 0 | Head-body partitioned string matching for Deep Packet Inspection with scalable and attack-resilient performance |
| 0 | Stabilizing pipelines for streaming applications |
| 0 | Efficient parallel algorithms for maximum-density segment problem |