|
|
| 414 | Giggle: a framework for constructing scalable replica location services |
| 372 | An overview of the BlueGene/L Supercomputer |
| 277 | MPICH-V: toward a scalable fault tolerant MPI for volatile nodes |
| 231 | Massive arrays of idle disks for storage archives |
| 195 | NAMD: biomolecular simulation on thousands of processors |
| 161 | A framework for performance modeling and prediction |
| 133 | Active harmony: towards automated performance tuning |
| 108 | ICENI: an open grid service architecture implemented with Jini |
| 103 | 16.4-Tflops direct numerical simulation of turbulence by a Fourier spectral method on the Earth Simulator |
| 102 | The Web Service Discovery Architecture |
| 96 | Performance optimizations and bounds for sparse matrix-vector multiply |
| 95 | A 26.58 Tflops global atmospheric simulation with the spectral transform method on the Earth Simulator |
| 86 | Applying Chimera virtual data concepts to cluster finding in the Sloan Sky Survey |
| 79 | SIGMA: a simulator infrastructure to guide memory analysis |
| 75 | UPC performance and potential: a NPB experimental study |
| 73 | Multivariate resource performance forecasting in the network weather service |
| 73 | High-density computing: a 240-processor Beowulf in one cubic meter |
| 73 | A decoupled scheduling approach for the GrADS program development environment |
| 70 | A TCP tuning daemon |
| 66 | The effects of systemic packet loss on aggregate TCP flows |
| 64 | Parallel multiscale Gauss-Newton-Krylov methods for inverse wave propagation |
| 64 | Owner prediction for accelerating cache-to-cache transfer misses in a cc-NUMA architecture |
| 63 | STORM: lightning-fast resource management |
| 61 | Interoperable Web services for computational portals |
| 58 | Scalable analysis techniques for microprocessor performance counter metrics |
| 56 | Executing multiple pipelined data analysis operations in the grid |
| 53 | An empirical performance evaluation of scalable scientific applications |
| 51 | Implementing the MPI process topology mechanism |
| 50 | SmartPointers: personalized scientific data portals in your hand |
| 46 | Salinas: a scalable software for high-performance structural and solid mechanics simulations |
| 45 | A high-level approach to synthesis of high-performance codes for quantum chemistry |
| 45 | 14.9 TFLOPS three-dimensional fluid simulation for fusion science with HPF on the Earth Simulator |
| 44 | Asserting performance expectations |
| 36 | Gilgamesh: a multithreaded processor-in-memory architecture for petaflops computing |
| 36 | Active Proxy-G: optimizing the query execution process in the grid |
| 34 | Better tiling and array contraction for compiling scientific programs |
| 32 | Disk cache replacement algorithm for storage resource managers in data grids |
| 28 | Compact application signatures for parallel and distributed scientific codes |
| 28 | Advanced visualization technology for terascale particle accelerator simulations |
| 28 | A scalable parallel fast multipole method for analysis of scattering from perfect electrically conducting surfaces |
| 27 | Ultra-high performance communication with MPI and the Sun fireTM link interconnect |
| 25 | Scalable directory services using proactivity |
| 24 | Data Reservoir: utilization of multi-gigabit backbone network for data-intensive research |
| 24 | Accelerating parallel maximum likelihood-based phylogenetic tree calculations using subtree equality vectors |
| 23 | Monitoring data archives for grid environments |
| 23 | Library support for hierarchical multi-processor tasks |
| 23 | Improving route lookup performance using network processor cache |
| 22 | On increasing architecture awareness in program optimizations to bridge the gap between peak and sustained processor performance: matrix-multiply revisited |
| 21 | MPI and OpenMP paradigms on cluster of SMP architectures: the vacancy tracking algorithm for multi-dimensional array transposition |
| 21 | Implementation and evaluation of a QoS-capable cluster-based IP router |
| 20 | Merging multiple data streams on common keys over high performance networks |
| 20 | Efficient synchronization for nonuniform communication architectures |
| 18 | A 29.5 Tflops simulation of planetesimals in Uranus-Neptune region on GRAPE-6 |
| 17 | SMP system interconnect instrumentation for performance analysis |
| 16 | QMView and GAMESS: integration into the world wide computational grid |
| 15 | Dual-level parallelism for deterministic and stochastic CFD problems |
| 13 | The Proteus multiprotocol message library |
| 13 | Separated high-bandwidth and low-latency communication in the cluster interconnect Clint |
| 11 | Collaborative simulation grid: multiscale quantum-mechanical/classical atomistic simulations on distributed PC clusters in the US and Japan |
| 10 | Early evaluation of the IBM p690 |
| 9 | Pipelined scheduling of tiled nested loops onto clusters of SMPs using memory mapped network interfaces |
| 9 | A new scheduling algorithm for parallel sparse LU factorization with static pivoting |
| 8 | A new data-mapping scheme for latency-tolerant distributed sparse triangular solution |
| 4 | Scaling the unscalable: a case study on the AlphaServer SC |
| 3 | Utilization of departmental computing GRID system for development of an artificial intelligent tapping inspection method, tapping sound analysis |
| 3 | Distributed dynamic hash tables using IBM LAPI |
| 2 | High performance computing meets experimental mathematics |