360 McRT-STM: a high performance software transactional memory system for a multi-core runtime
197 Hybrid transactional memory
139 POSH: a TLS compiler that exploits program structure
87 Exploiting distributed version concurrency in a transactional memory cluster
81 Performance evaluation of adaptive MPI
70 Predicting bounds on queuing delay for batch-scheduled parallel machines
68 Accurate and efficient runtime detection of atomicity errors in concurrent programs
63 Programming for parallelism and locality with hierarchically tiled arrays
57 Proving correctness of highly-concurrent linearisable objects
47 RDMA read based rendezvous protocol for MPI over InfiniBand: design alternatives and benefits
47 On-line automated performance diagnosis on thousands of processes
43 Minimizing execution time in MPI programs on an energy-constrained, power-scalable cluster
38 Hardware profile-guided automatic page placement for ccNUMA systems
37 Scalable synchronous queues
32 Adaptive scheduling with parallelism feedback
31 Mobile MPI programs in computational grids
30 Collective communication on architectures that support simultaneous communication over multiple links
22 Performance characterization of molecular dynamics techniques for biomolecular simulations
19 Optimizing irregular shared-memory applications for distributed-memory systems
18 Global-view abstractions for user-defined reductions and scans
17 High-performance IPv6 forwarding algorithm for multi-core and multithreaded network processor
13 Teaching parallel computing to science faculty: best practices and common pitfalls
12 A case study in top-down performance estimation for a large-scale parallel application
10 MAMA!: a memory allocator for multithreaded architectures
9 Parallel programming and code selection in fortress
5 Fast and transparent recovery for continuous availability of cluster-based servers
0 Parallel programming in modern web search engines