SC1110090807 0605040302
PPoPP1110090807 060503
ICS1110090807 0605040302
IPDPS1110090807 0605040302
ISCA1110090807 0605040302
ASPLOS11100908 060402
MICRO1110090807 0605040302
HPCA1110090807 0605040302

13 Mint: Realizing CUDA performance in 3D Stencil Methods with Annotated C
9 Hystor: Making the Best Use of Solid State Drives in High Performance Storage Systems
8 Page Placement in Hybrid Memory Systems
7 A QHD-Capable Parallel H.264 Decoder
6 Automatic generation of executable communication specifications from parallel applications
6 ZEBRA : A Data-Centric, Hybrid-Policy Hardware Transactional Memory Design
4 Coordinating Processor and Main Memory for Efficient Server Power Control
4 Transactional Conflict Decoupling and Value Prediction
4 An Idiom-finding Tool for Increasing Productivity of Accelerators
4 High Performance Linpack Benchmark: A Fault Tolerant Implementation without Checkpointing
4 Generic Topology Mapping Strategies for Large-scale Parallel Architectures
3 Karma: Scalable Deterministic Record-Replay
3 Performance Impact and Interplay of SSD Parallelism through Advanced Commands, Allocation Strategy and Data Granularity
3 Modeling the Performance of an Algebraic Multigrid Cycle on HPC Platforms
2 Using GPU to Compute Large Out-of-card FFTs
2 Controlling Cache Utilization of HPC Applications
2 Predictive Coordination of Multiple On-Chip Resources for Chip Multiprocessors
2 Scalable Fine-grained Call Path Tracing
2 Multiset Signatures for Transactional Memory
1 Active Pebbles: Parallel Programming for Data-Driven Applications
1 SecureME: A Hardware-Software Approach to Full System Security
1 Characterizing the Impact of Soft Errors on Iterative Methods in Scientific Computing
1 The elephant and the mice: the role of non-strict fine-grain synchronization for modern many-core architectures
1 An Execution Strategy and Optimized Runtime Support for Parallelizing Irregular Reductions on Modern GPUs
1 MDR: Performance model driven runtime for heterogeneous parallel platforms
1 Processing data streams with hard real-time constraints on heterogeneous systems
0 Optimizing the Datacenter for Data-Centric Workloads
0 A Composite and Scalable Cache Coherence Protocol for Large Scale CMPs
0 Automatic SIMD Vectorization of Fast Fourier Transforms for the Larrabee and AVX Instruction Sets
0 MP-PIPE: A Massively Parallel Protein-Protein Interaction Prediction Engine
0 Optimizing Throughput/Power Tradeoffs in Hardware Transactional Memory Using DVFS and Intelligent Scheduling
0 Automating GPU Computing in MATLAB
0 Cosmic Microwave Background Map-Making At The Petascale And Beyond
0 Cost-Effectively Offering Private Buffers from a Shared Cache
0 F^2BFLY: An On-Chip Free-Space Optical Network with Wavelength-Switching