SC1110090807 0605040302
PPoPP1110090807 060503
ICS1110090807 0605040302
IPDPS1110090807 0605040302
ISCA1110090807 0605040302
ASPLOS11100908 060402
MICRO1110090807 0605040302
HPCA1110090807 0605040302

185 Designing efficient sorting algorithms for manycore GPUs
95 Cost-benefit analysis of Cloud Computing versus desktop grids
57 A scalable auto-tuning framework for compiler optimization
55 Adaptable, metadata rich IO methods for portable high performance IO
46 A cross-input adaptive framework for GPU program optimizations
42 Work-first and help-first scheduling policies for async-finish task parallelism
38 DMTCP: Transparent checkpointing for cluster computations and the desktop
34 Accelerating leukocyte tracking using CUDA: A case study in leveraging manycore coprocessors
34 Small-file access in parallel file systems
34 Exploring the multiple-GPU design space
31 Message passing on data-parallel architectures
29 Singular value decomposition on GPU using CUDA
29 vCUDA: GPU accelerated high performance computing in virtual machines
29 Information spreading in stationary Markovian evolving graphs
27 Annotation-based empirical performance tuning using Orio
25 A framework for efficient and scalable execution of domain-specific templates on GPUs
24 An asynchronous leader election algorithm for dynamic networks
22 Efficient large-scale model checking
21 Compiler-enhanced incremental checkpointing for OpenMP applications
21 CellMR: A framework for supporting mapreduce on asymmetric cell-based clusters
21 Unit disk graph and physical interference model: Putting pieces together
19 Best-effort parallel execution framework for Recognition and mining applications
18 Phaser accumulators: A new reduction construct for dynamic parallelism
17 Autonomic management of non-functional concerns in distributed & parallel application programming
17 Scaling communication-intensive applications on BlueGene/P using one-sided communication and overlap
16 Sequence alignment with GPU: Performance and design challenges
16 Evaluating the use of GPUs in liver image segmentation and HMMER database searches
16 Elastic scaling of data parallel operators in stream processing
16 Input-independent, scalable and fast string matching on the Cray XMT
15 Automatic detection of parallel applications computation phases
15 Making resonance a common case: A high-performance implementation of collective I/O on parallel file systems
15 Parallel data-locality aware stencil computations on modern micro-architectures
13 Handling OS jitter on multicore multithreaded systems
13 Treat-before-trick : Free-riding prevention for BitTorrent-like peer-to-peer networks
13 Taking the heat off transactions: Dynamic selection of pessimistic concurrency control
12 Multi-dimensional characterization of temporal data mining on graphics processors
12 High-order stencil computations on multicore clusters
12 Offer-based scheduling of deadline-constrained Bag-of-Tasks applications for utility computing systems
11 Minimizing total busy time in parallel scheduling with application to optical networks
11 Scalability challenges for massively parallel AMR applications
11 Map construction and exploration by mobile agents scattered in a dangerous network
10 Speculation-based conflict resolution in hardware transactional memory
10 Sensor network connectivity with multiple directional antennae of a given angular sum
10 A snap-stabilizing point-to-point communication protocol in message-switched networks
9 Core-aware memory access scheduling schemes
9 Using hardware transactional memory for data race detection
9 Competitive buffer management with packet dependencies
9 Building a parallel pipelined external memory algorithm library
9 Remote-spanners: What to know beyond neighbors
8 Efficient microarchitecture policies for accurately adapting to power constraints
8 Energy minimization for periodic real-time tasks on heterogeneous processing units
8 Compact graph representations and parallel connectivity algorithms for massive dynamic network analysis
8 Parallel short sequence mapping for high throughput genome sequencing
8 Disjoint-path routing: Efficient communication for streaming applications
8 Performance analysis of Optical Packet Switches enhanced with electronic buffering
8 A fusion-based approach for tolerating faults in finite state machines
8 Self-stabilizing minimum-degree spanning tree within one from the optimal degree
7 Efficient scheduling of task graph collections on heterogeneous resources
7 HPCC Random Access benchmark for next generation supercomputers
7 Helgrind+: An efficient dynamic race detector
7 A resource allocation approach for supporting time-critical applications in grid environments
7 A metascalable computing framework for large spatiotemporal-scale atomistic simulations
7 Parallel accelerated cartesian expansions for particle dynamics simulations
6 Crash fault detection in celerating environments
6 A partition-based approach to support streaming updates over persistent data in an active datawarehouse
6 Performance projection of HPC applications using SPEC CFP2006 benchmarks
6 Scheduling resizable parallel applications
6 Minimizing startup costs for performance-critical threading
6 A component-based framework for the Cell Broadband Engine
5 Static strategies forworksharing with unrecoverable interruptions
5 An upload bandwidth threshold for peer-to-peer Video-on-Demand scalability
5 An on/off link activation method for low-power ethernet in PC clusters
5 An approach for matching communication patterns in parallel applications
5 Optimal deterministic self-stabilizing vertex coloring in unidirectional anonymous networks
5 Dynamic high-level scripting in parallel applications
5 Robust sequential resource allocation in heterogeneous distributed systems with random compute node failures
4 A new mechanism to deal with process variability in NoC links
4 Online time constrained scheduling with penalties
4 Architectural implications for spatial object association algorithms
4 Multiple priority customer service guarantees in cluster computing
4 Robust data placement in urgent computing environments
4 Validating Wrekavoc: A tool for heterogeneity emulation
4 Portable builds of HPC applications on diverse target platforms
3 Efficient shared cache management through sharing-aware replacement and streaming-aware insertion policy
3 The Weak Mutual Exclusion problem
3 Understanding the design trade-offs among current multicore systems for numerical computations
3 On the tradeoff between playback delay and buffer space in streaming
3 Resource-aware allocation strategies for divisible loads on large-scale systems
2 On scheduling dags to maximize area
2 On the complexity of mapping pipelined filtering services on heterogeneous platforms
2 Multi-users scheduling in parallel systems
2 Transitive closure on the cell broadband engine: A study on self-scheduling in a multicore processor
2 Accommodating bursts in distributed stream processing systems
2 Combinatorial properties for efficient communication in distributed networks with local interactions
2 Path-robust multi-channel wireless networks
2 A performance model for Fast Fourier Transform
2 Revisiting communication performance models for computational clusters
1 TupleQ: Fully-asynchronous and zero-copy MPI over InfiniBand
1 Design, implementation, and evaluation of transparent pNFS on Lustre
1 Dynamic iterations for the solution of ordinary differential equations on multicore processors
1 Packer: An innovative space-time-efficient parallel garbage collection algorithm based on virtual spaces
1 Parallel implementation of Irregular Terrain Model on IBM Cell Broadband Engine
1 A general approach to toroidal mesh decontamination with local immunity
0 Scalable RDMA performance in PGAS languages
0 Coupled placement in modern data centers
0 On reducing misspeculations in a pipelined scheduler
0 Concurrent SSA for general barrier-synchronized parallel programs
0 NewMadeleine: An efficient support for high-performance networks in MPICH2