(Professors James W. Demmel and Katherine A. Yelick)

(DOE) DE-FG03-94ER25219, (DOE) DE-FC03-98ER25351, and (LLNL/DOE) W-7405-ENG-48

The overall performance in a variety of scientific computing and information retrieval applications is dominated by a few computational kernels. One important class of such operations are sparse matrix kernels--computations with matrices having relatively few non-zero entries. Performance tuning of sparse kernels is a particularly tedious and time-consuming task because performance is a complicated function of the kernel, machine, and non-zero structure of the matrix: for every machine, a user must carefully choose a data structure and implementation (code and transformations) that minimize the matrix storage while achieving the best possible performance.

The goal of this research project is to generate implementations of particular sparse kernels tuned to a given matrix and machine. Our work builds on Sparsity [1], an early successful prototype for sparse matrix-vector multiply (y = A*x, where A is a sparse matrix and x, y, are dense vectors), in the following ways:

- Architecture-specific performance bounds: by careful analysis, we have shown that the code generated by the Sparsity system is often within 20% of the fastest possible. This places a limit on what additional low-level tuning (e.g., instruction scheduling) will do, and helps identify new opportunities for performance improvements [2]. We have applied a similar analysis to sparse triangular solve [3].
- New optimization techniques: we are currently exploring a large space of techniques to exploit a variety of matrix structures (symmetry, diagonals, bands, variable blocks, etc.) and to increase locality (reordering to create dense structure, cache blocking, multiple vectors, etc.). We will develop automatic techniques for deciding when and in what combinations to apply these methods.
- Extensions to higher-level kernels: we are
exploring new sparse kernels, such as multiplying by A and A
^{T}simultaneously, A*A^{T}*x (for interior point methods and the SVD), R*A*R^{T}(A and R are sparse; used in multigrid methods), and A^{k}*x, among others. - Implementation of an automatically tuned sparse matrix library: we will make this work available as an implementation of the new Sparse BLAS standard [4], augmented by a single routine to facilitate automatic tuning.

This research is being conducted by members of the Berkeley Benchmarking and OPtimization (BeBOP) project [5].

- [1]
- E.-J. Im, "Optimizing the Performance of Sparse Matrix-Vector Multiplication," PhD thesis, UC Berkeley, May 2000.
- [2]
- R. Vuduc, J. Demmel, K. Yelick, et al., "Performance
Optimizations and Bounds for Sparse-Matrix Vector Multiply,"
*Supercomputing,*November 2002. - [3]
- R. Vuduc, S. Kamil, et al., "Automatic Performance Tuning
and Analysis of Sparse Triangular Solve,"
*Int. Conf. Supercomputing, Workshop on Performance Optimization of High-level Languages and Libraries,*June 2002. - [4]
- BLAS Technical Forum: http://www.netlib.org/ blas/blast-forum.
- [5]
- The BeBOP Homepage: http://www.cs.berkeley. edu/~richie/bebop.

(Professor James W. Demmel)

NTONC 29823

SUGAR is a simulation tool for MEMS that inherits its name and philosophy from the IC simulator SPICE. SUGAR is used by researchers at the Berkeley Sensor and Actuator Center, and also at outside sites. Currently, about 500 users have filled out the registration form, and there are many unregistered users as well.

We are working to extend SUGAR in several ways:

- Applying homotopy methods to discover electrostatic pull-in, buckling of mechanical components, and other significant bifurcations
- Building fast reduced-order models that incorporate nonlinear effects
- Studying sensitivity of microsystem performance to deviations from nominal geometry and material properties
- Extending the set of models available to the user
- Building a framework for comparing results from optical measurement tools to simulation results, in order to improve the fidelity of the simulation and to better understand the behavior of different designs

We have also deployed a version of SUGAR as a web service running on the Millennium cluster. Also, the SUGAR software was used in a graduate introduction to MEMS course.

There now exists an open source SUGAR repository under the BSD software license on sourceforge.net at http://sourceforge.net/projects/mems. Other relevant information may be found through this website.

SUGAR is a collaborative effort of many graduate students and professors. Affiliated graduate students include David Bindel, Jason V. Clark, David Garmire, Babak Jamshidi, Rishi Kant, Shyam Lakshmin, Jiawang Nie. Affiliated professors include Alice Agogino, Zhaojun Bai, James Demmel, Sanjay Govindjee, Ming Gu, and Kristofer S. J. Pister.