|
|
| 185 | Designing efficient sorting algorithms for manycore GPUs |
| 95 | Cost-benefit analysis of Cloud Computing versus desktop grids |
| 57 | A scalable auto-tuning framework for compiler optimization |
| 55 | Adaptable, metadata rich IO methods for portable high performance IO |
| 46 | A cross-input adaptive framework for GPU program optimizations |
| 42 | Work-first and help-first scheduling policies for async-finish task parallelism |
| 38 | DMTCP: Transparent checkpointing for cluster computations and the desktop |
| 34 | Accelerating leukocyte tracking using CUDA: A case study in leveraging manycore coprocessors |
| 34 | Small-file access in parallel file systems |
| 34 | Exploring the multiple-GPU design space |
| 31 | Message passing on data-parallel architectures |
| 29 | Singular value decomposition on GPU using CUDA |
| 29 | vCUDA: GPU accelerated high performance computing in virtual machines |
| 29 | Information spreading in stationary Markovian evolving graphs |
| 27 | Annotation-based empirical performance tuning using Orio |
| 25 | A framework for efficient and scalable execution of domain-specific templates on GPUs |
| 24 | An asynchronous leader election algorithm for dynamic networks |
| 22 | Efficient large-scale model checking |
| 21 | Compiler-enhanced incremental checkpointing for OpenMP applications |
| 21 | CellMR: A framework for supporting mapreduce on asymmetric cell-based clusters |
| 21 | Unit disk graph and physical interference model: Putting pieces together |
| 19 | Best-effort parallel execution framework for Recognition and mining applications |
| 18 | Phaser accumulators: A new reduction construct for dynamic parallelism |
| 17 | Autonomic management of non-functional concerns in distributed & parallel application programming |
| 17 | Scaling communication-intensive applications on BlueGene/P using one-sided communication and overlap |
| 16 | Sequence alignment with GPU: Performance and design challenges |
| 16 | Evaluating the use of GPUs in liver image segmentation and HMMER database searches |
| 16 | Elastic scaling of data parallel operators in stream processing |
| 16 | Input-independent, scalable and fast string matching on the Cray XMT |
| 15 | Automatic detection of parallel applications computation phases |
| 15 | Making resonance a common case: A high-performance implementation of collective I/O on parallel file systems |
| 15 | Parallel data-locality aware stencil computations on modern micro-architectures |
| 13 | Handling OS jitter on multicore multithreaded systems |
| 13 | Treat-before-trick : Free-riding prevention for BitTorrent-like peer-to-peer networks |
| 13 | Taking the heat off transactions: Dynamic selection of pessimistic concurrency control |
| 12 | Multi-dimensional characterization of temporal data mining on graphics processors |
| 12 | High-order stencil computations on multicore clusters |
| 12 | Offer-based scheduling of deadline-constrained Bag-of-Tasks applications for utility computing systems |
| 11 | Minimizing total busy time in parallel scheduling with application to optical networks |
| 11 | Scalability challenges for massively parallel AMR applications |
| 11 | Map construction and exploration by mobile agents scattered in a dangerous network |
| 10 | Speculation-based conflict resolution in hardware transactional memory |
| 10 | Sensor network connectivity with multiple directional antennae of a given angular sum |
| 10 | A snap-stabilizing point-to-point communication protocol in message-switched networks |
| 9 | Core-aware memory access scheduling schemes |
| 9 | Using hardware transactional memory for data race detection |
| 9 | Competitive buffer management with packet dependencies |
| 9 | Building a parallel pipelined external memory algorithm library |
| 9 | Remote-spanners: What to know beyond neighbors |
| 8 | Efficient microarchitecture policies for accurately adapting to power constraints |
| 8 | Energy minimization for periodic real-time tasks on heterogeneous processing units |
| 8 | Compact graph representations and parallel connectivity algorithms for massive dynamic network analysis |
| 8 | Parallel short sequence mapping for high throughput genome sequencing |
| 8 | Disjoint-path routing: Efficient communication for streaming applications |
| 8 | Performance analysis of Optical Packet Switches enhanced with electronic buffering |
| 8 | A fusion-based approach for tolerating faults in finite state machines |
| 8 | Self-stabilizing minimum-degree spanning tree within one from the optimal degree |
| 7 | Efficient scheduling of task graph collections on heterogeneous resources |
| 7 | HPCC Random Access benchmark for next generation supercomputers |
| 7 | Helgrind+: An efficient dynamic race detector |
| 7 | A resource allocation approach for supporting time-critical applications in grid environments |
| 7 | A metascalable computing framework for large spatiotemporal-scale atomistic simulations |
| 7 | Parallel accelerated cartesian expansions for particle dynamics simulations |
| 6 | Crash fault detection in celerating environments |
| 6 | A partition-based approach to support streaming updates over persistent data in an active datawarehouse |
| 6 | Performance projection of HPC applications using SPEC CFP2006 benchmarks |
| 6 | Scheduling resizable parallel applications |
| 6 | Minimizing startup costs for performance-critical threading |
| 6 | A component-based framework for the Cell Broadband Engine |
| 5 | Static strategies forworksharing with unrecoverable interruptions |
| 5 | An upload bandwidth threshold for peer-to-peer Video-on-Demand scalability |
| 5 | An on/off link activation method for low-power ethernet in PC clusters |
| 5 | An approach for matching communication patterns in parallel applications |
| 5 | Optimal deterministic self-stabilizing vertex coloring in unidirectional anonymous networks |
| 5 | Dynamic high-level scripting in parallel applications |
| 5 | Robust sequential resource allocation in heterogeneous distributed systems with random compute node failures |
| 4 | A new mechanism to deal with process variability in NoC links |
| 4 | Online time constrained scheduling with penalties |
| 4 | Architectural implications for spatial object association algorithms |
| 4 | Multiple priority customer service guarantees in cluster computing |
| 4 | Robust data placement in urgent computing environments |
| 4 | Validating Wrekavoc: A tool for heterogeneity emulation |
| 4 | Portable builds of HPC applications on diverse target platforms |
| 3 | Efficient shared cache management through sharing-aware replacement and streaming-aware insertion policy |
| 3 | The Weak Mutual Exclusion problem |
| 3 | Understanding the design trade-offs among current multicore systems for numerical computations |
| 3 | On the tradeoff between playback delay and buffer space in streaming |
| 3 | Resource-aware allocation strategies for divisible loads on large-scale systems |
| 2 | On scheduling dags to maximize area |
| 2 | On the complexity of mapping pipelined filtering services on heterogeneous platforms |
| 2 | Multi-users scheduling in parallel systems |
| 2 | Transitive closure on the cell broadband engine: A study on self-scheduling in a multicore processor |
| 2 | Accommodating bursts in distributed stream processing systems |
| 2 | Combinatorial properties for efficient communication in distributed networks with local interactions |
| 2 | Path-robust multi-channel wireless networks |
| 2 | A performance model for Fast Fourier Transform |
| 2 | Revisiting communication performance models for computational clusters |
| 1 | TupleQ: Fully-asynchronous and zero-copy MPI over InfiniBand |
| 1 | Design, implementation, and evaluation of transparent pNFS on Lustre |
| 1 | Dynamic iterations for the solution of ordinary differential equations on multicore processors |
| 1 | Packer: An innovative space-time-efficient parallel garbage collection algorithm based on virtual spaces |
| 1 | Parallel implementation of Irregular Terrain Model on IBM Cell Broadband Engine |
| 1 | A general approach to toroidal mesh decontamination with local immunity |
| 0 | Scalable RDMA performance in PGAS languages |
| 0 | Coupled placement in modern data centers |
| 0 | On reducing misspeculations in a pipelined scheduler |
| 0 | Concurrent SSA for general barrier-synchronized parallel programs |
| 0 | NewMadeleine: An efficient support for high-performance networks in MPICH2 |