27 Copperhead: Compiling an Embedded Data Parallel Language
25 A domain specific approach to heterogeneous parallelism
20 Accelerating CUDA Graph Algorithms at Maximum Warp
11 Lock-free and scalable multi-version Software Transactional Memory
10 Achieving a Single Compute Device Image in OpenCL for Multiple GPUs
9 ULCC: A User-Level Facility for Optimizing Shared Cache Performance on Multicores
9 Auto-tuning of Fast Fourier Transform on Graphics Processors
8 Cooperative Reasoning for Preemptive Execution
8 Lifeline-based Global Load Balancing
7 OoOJava: Software Out-of-Order Execution
7 SpiceC: Scalable parallelism via implicit copying and explicit Commit
7 All-Window Profiling and Composable Models of Cache Sharing
7 Wait-Free Queues With Multiple Enqueuers and Dequeuers
6 Transaction Communicators: Enabling Cooperation Among Concurrent Transactions
6 The STAPL Parallel Container Framework
5 Ordered vs. Unordered: a Comparison of Parallelism and Work-Efficiency in Irregular Algorithms
5 ScalaExtrap: Trace-Based Communication Extrapolation for SPMD Programs
5 Communicating Memory Transactions
4 GRace: A Low-Overhead Mechanism for Detecting Data Races in GPU Programs
4 COREMU: A Scalable and Portable Parallel Full-system Emulator
4 CSX: An Extended Compression Format for SpMV on Shared Memory Systems
3 Programming the Memory Hierarchy Revisited: Supporting Irregular Parallelism in Sequoia
2 Enhanced Speculative Parallelization Via Incremental Recovery
1 Compact Data Structure and Scalable Algorithms for the Sparse Grid Technique
1 Inferring Ownership Transfer for Efficient Message Passing
1 Thread Contracts for Safe Parallelism