

185  Designing efficient sorting algorithms for manycore GPUs 
95  Costbenefit analysis of Cloud Computing versus desktop grids 
57  A scalable autotuning framework for compiler optimization 
55  Adaptable, metadata rich IO methods for portable high performance IO 
46  A crossinput adaptive framework for GPU program optimizations 
42  Workfirst and helpfirst scheduling policies for asyncfinish task parallelism 
38  DMTCP: Transparent checkpointing for cluster computations and the desktop 
34  Accelerating leukocyte tracking using CUDA: A case study in leveraging manycore coprocessors 
34  Smallfile access in parallel file systems 
34  Exploring the multipleGPU design space 
31  Message passing on dataparallel architectures 
29  Singular value decomposition on GPU using CUDA 
29  vCUDA: GPU accelerated high performance computing in virtual machines 
29  Information spreading in stationary Markovian evolving graphs 
27  Annotationbased empirical performance tuning using Orio 
25  A framework for efficient and scalable execution of domainspecific templates on GPUs 
24  An asynchronous leader election algorithm for dynamic networks 
22  Efficient largescale model checking 
21  Compilerenhanced incremental checkpointing for OpenMP applications 
21  CellMR: A framework for supporting mapreduce on asymmetric cellbased clusters 
21  Unit disk graph and physical interference model: Putting pieces together 
19  Besteffort parallel execution framework for Recognition and mining applications 
18  Phaser accumulators: A new reduction construct for dynamic parallelism 
17  Autonomic management of nonfunctional concerns in distributed & parallel application programming 
17  Scaling communicationintensive applications on BlueGene/P using onesided communication and overlap 
16  Sequence alignment with GPU: Performance and design challenges 
16  Evaluating the use of GPUs in liver image segmentation and HMMER database searches 
16  Elastic scaling of data parallel operators in stream processing 
16  Inputindependent, scalable and fast string matching on the Cray XMT 
15  Automatic detection of parallel applications computation phases 
15  Making resonance a common case: A highperformance implementation of collective I/O on parallel file systems 
15  Parallel datalocality aware stencil computations on modern microarchitectures 
13  Handling OS jitter on multicore multithreaded systems 
13  Treatbeforetrick : Freeriding prevention for BitTorrentlike peertopeer networks 
13  Taking the heat off transactions: Dynamic selection of pessimistic concurrency control 
12  Multidimensional characterization of temporal data mining on graphics processors 
12  Highorder stencil computations on multicore clusters 
12  Offerbased scheduling of deadlineconstrained BagofTasks applications for utility computing systems 
11  Minimizing total busy time in parallel scheduling with application to optical networks 
11  Scalability challenges for massively parallel AMR applications 
11  Map construction and exploration by mobile agents scattered in a dangerous network 
10  Speculationbased conflict resolution in hardware transactional memory 
10  Sensor network connectivity with multiple directional antennae of a given angular sum 
10  A snapstabilizing pointtopoint communication protocol in messageswitched networks 
9  Coreaware memory access scheduling schemes 
9  Using hardware transactional memory for data race detection 
9  Competitive buffer management with packet dependencies 
9  Building a parallel pipelined external memory algorithm library 
9  Remotespanners: What to know beyond neighbors 
8  Efficient microarchitecture policies for accurately adapting to power constraints 
8  Energy minimization for periodic realtime tasks on heterogeneous processing units 
8  Compact graph representations and parallel connectivity algorithms for massive dynamic network analysis 
8  Parallel short sequence mapping for high throughput genome sequencing 
8  Disjointpath routing: Efficient communication for streaming applications 
8  Performance analysis of Optical Packet Switches enhanced with electronic buffering 
8  A fusionbased approach for tolerating faults in finite state machines 
8  Selfstabilizing minimumdegree spanning tree within one from the optimal degree 
7  Efficient scheduling of task graph collections on heterogeneous resources 
7  HPCC Random Access benchmark for next generation supercomputers 
7  Helgrind+: An efficient dynamic race detector 
7  A resource allocation approach for supporting timecritical applications in grid environments 
7  A metascalable computing framework for large spatiotemporalscale atomistic simulations 
7  Parallel accelerated cartesian expansions for particle dynamics simulations 
6  Crash fault detection in celerating environments 
6  A partitionbased approach to support streaming updates over persistent data in an active datawarehouse 
6  Performance projection of HPC applications using SPEC CFP2006 benchmarks 
6  Scheduling resizable parallel applications 
6  Minimizing startup costs for performancecritical threading 
6  A componentbased framework for the Cell Broadband Engine 
5  Static strategies forworksharing with unrecoverable interruptions 
5  An upload bandwidth threshold for peertopeer VideoonDemand scalability 
5  An on/off link activation method for lowpower ethernet in PC clusters 
5  An approach for matching communication patterns in parallel applications 
5  Optimal deterministic selfstabilizing vertex coloring in unidirectional anonymous networks 
5  Dynamic highlevel scripting in parallel applications 
5  Robust sequential resource allocation in heterogeneous distributed systems with random compute node failures 
4  A new mechanism to deal with process variability in NoC links 
4  Online time constrained scheduling with penalties 
4  Architectural implications for spatial object association algorithms 
4  Multiple priority customer service guarantees in cluster computing 
4  Robust data placement in urgent computing environments 
4  Validating Wrekavoc: A tool for heterogeneity emulation 
4  Portable builds of HPC applications on diverse target platforms 
3  Efficient shared cache management through sharingaware replacement and streamingaware insertion policy 
3  The Weak Mutual Exclusion problem 
3  Understanding the design tradeoffs among current multicore systems for numerical computations 
3  On the tradeoff between playback delay and buffer space in streaming 
3  Resourceaware allocation strategies for divisible loads on largescale systems 
2  On scheduling dags to maximize area 
2  On the complexity of mapping pipelined filtering services on heterogeneous platforms 
2  Multiusers scheduling in parallel systems 
2  Transitive closure on the cell broadband engine: A study on selfscheduling in a multicore processor 
2  Accommodating bursts in distributed stream processing systems 
2  Combinatorial properties for efficient communication in distributed networks with local interactions 
2  Pathrobust multichannel wireless networks 
2  A performance model for Fast Fourier Transform 
2  Revisiting communication performance models for computational clusters 
1  TupleQ: Fullyasynchronous and zerocopy MPI over InfiniBand 
1  Design, implementation, and evaluation of transparent pNFS on Lustre 
1  Dynamic iterations for the solution of ordinary differential equations on multicore processors 
1  Packer: An innovative spacetimeefficient parallel garbage collection algorithm based on virtual spaces 
1  Parallel implementation of Irregular Terrain Model on IBM Cell Broadband Engine 
1  A general approach to toroidal mesh decontamination with local immunity 
0  Scalable RDMA performance in PGAS languages 
0  Coupled placement in modern data centers 
0  On reducing misspeculations in a pipelined scheduler 
0  Concurrent SSA for general barriersynchronized parallel programs 
0  NewMadeleine: An efficient support for highperformance networks in MPICH2 