|
|
| 443 | Optimization principles and application performance evaluation of a multithreaded GPU using CUDA |
| 181 | Dynamic performance tuning of word-based software transactional memory |
| 156 | On the correctness of transactional memory |
| 98 | Transactional boosting: a methodology for highly-concurrent transactional objects |
| 91 | Software transactional memory for large scale clusters |
| 68 | Automatic data movement and computation mapping for multi-level parallel architectures with explicitly managed memories |
| 63 | FastForward for efficient pipeline parallelism: a cache-optimized concurrent lock-free queue |
| 47 | ZOID: I/O-forwarding infrastructure for petascale architectures |
| 43 | SuperMatrix: a multithreaded runtime scheduling system for algorithms-by-blocks |
| 34 | Massive parallel LDPC decoding on GPU |
| 29 | Modeling optimistic concurrency using quantitative dependence analysis |
| 28 | Nested parallelism in transactional memory |
| 27 | Performance without pain = productivity: data layout and collective communication in UPC |
| 25 | A portable runtime interface for multi-level memory hierarchies |
| 24 | Toward high performance nonblocking software transactional memory |
| 23 | Programming with tiles |
| 23 | Design and implementation of a high-performance MPI for C# and the common language infrastructure |
| 19 | Split hardware transactions: true nesting of transactions using best-effort hardware transactional memory |
| 14 | Quasi-static scheduling for safe futures |
| 14 | Matrix product on heterogeneous master-worker platforms |
| 11 | Scalable packet classification using interpreting: a cross-platform multi-core solution |
| 10 | Type inference for locality analysis of distributed data structures |
| 10 | High performance dense linear algebra on a spatially distributed processor |
| 5 | Concurrent GC leveraging transactional memory |
| 4 | A case study in SIMD text processing with parallel bit streams: UTF-8 to UTF-16 transcoding |