J. Demmel, J. Dongarra, B. N. Parlett, W. M. Kahan, M. Gu, D. Bindel, Y. Hida, X. Li, O. Marques, E. J. Riedy, C. Vomel, J. Langou, P. Luszczek, J. Kurzak, A. Buttari, J. Langou, and S. Tomov, "Prospectus for the next LAPACK and ScaLAPACK libraries," in Applied Parallel Computing: State of the Art in Scientific Computing. Proc. 8th Intl. Workshop (PARA 2006). Revised Selected Papers, B. Kagstrom, E. Elmroth, J. Dongarra, and J. Wasniewski, Eds., Lecture Notes in Computer Science, Vol. 4699, Berlin, Germany: Springer-Verlag, 2007, pp. 11-23.
D. S. Bindel, Z. Bai, and J. Demmel, "Model reduction for RF MEMS simulation," in Applied Parallel Computing: State of the Art in Scientific Computing. Proc. 7th Intl. Workshop (PARA 2004): Revised Selected Papers, J. Dongarra, K. Madsen, and J. Wasniewski, Eds., Lecture Notes in Computer Science, Vol. 3732, Berlin, Germany: Springer-Verlag, 2006, pp. 286-295.
E. J. Im, I. Bustany, C. Ashcraft, J. Demmel, and K. A. Yelick, "Performance tuning of matrix triple products based on matrix structure," in Applied Parallel Computing: State of the Art in Scientific Computing. Proc. 7th Intl. Workshop (PARA 2004): Revised Selected Papers, J. Dongarra, K. Madsen, and J. Wasniewski, Eds., Lecture Notes in Computer Science, Vol. 3732, Berlin, Germany: Springer-Verlag, 2006, pp. 740-746.
J. Nie and J. Demmel, "Shape Optimization of Transfer Functions," in Multiscale Optimization Methods and Applications, W. W. Hager, S. J. Huang, P. M. Pardalos, and O. A. Prokopyev, Eds., Nonconvex Optimization and Its Applications, Vol. 82, Berlin, Germany: Springer-Verlag, 2006, pp. 313-326.
D. S. Bindel, J. Demmel, M. J. Friedman, W. J. F. Govaerts, and Y. A. Kuznetsov, "Bifurcation analysis of large equilibrium systems in MATLAB," in Computational Science: Proc. 5th Intl. Conf. (ICCS 2005), V. S. Sunderam, G. D. van Albada, P. M. A. Sloot, and J. J. Dongarra, Eds., Lecture Notes in Computer Science, Vol. 3514, Berlin, Germany: Springer-Verlag, 2005, pp. 50-57.
R. Vuduc, A. Gyulassy, J. Demmel, and K. A. Yelick, "Memory hierarchy optimizations and performance bounds for Sparse {A sup T Ax}," in Computational Science: Proc. Intl. Conf. on Computational Science (ICCS 2003), P. M. A. Sloot, D. Abramson, A. V. Bogdanov, J. J. Dongarra, A. Y. Zomaya, and Y. E. Gorbachev, Eds., Lecture Notes in Computer Science, Vol. 2659, Berlin, Germany: Springer-Verlag, 2003, pp. 705-714.
L. A. Drummond, J. Demmel, C. R. Mechoso, H. Robinson, K. Sklower, and J. A. Spahr, "A data broker for distributed computing environments," in Computational Science -- Part I: Proc. Intl. Conf. (ICCS 2001), V. N. Alexandrov, J. Dongarra, B. A. Juliano, R. S. Renner, and C. J. K. Tan, Eds., Lecture Notes in Computer Science, Vol. 2073, Berlin, Germany: Springer-Verlag, 2001, pp. 31-40.
R. Vuduc, J. Demmel, and J. Bilmes, "Statistical models for automatic performance tuning," in Computational Science: Proc. Intl. Conf. on Computational Science (ICCS 2001), V. N. Alexandrov, J. J. Dongarra, B. A. Juliano, R. S. Renner, and C. J. K. Tan, Eds., Lecture Notes in Computer Science, Vol. 2073, Berlin, Germany: Springer-Verlag, 2001, pp. 117-126.
R. Vuduc and J. Demmel, "Code generators for automatic tuning of numerical kernels: Experiences with FFTW," in Semantics, Applications, and Implementation of Program Generation: Proc. 2000 Intl. Workshop (SAIG 2000), W. Taha, Ed., Lecture Notes in Computer Science, Vol. 1924, Berlin, Germany: Springer-Verlag, 2000, pp. 190-211.
L. S. Blackford, A. J. Cleary, J. Demmel, I. S. Dhillon, J. Dongarra, S. Hammarling, A. Petitet, H. Ren, K. Stanley, and R. C. Whaley, "Practical experience in the dangers of heterogeneous computing," in Applied Parallel Computing: Industrial Computation and Optimization. Proc. 3rd Intl. Workshop (PARA '96), J. Wasniewski, J. Dongarra, K. Madsen, and D. Olesen, Eds., Lecture Notes in Computer Science, Vol. 1184, Berlin, Germany: Springer-Verlag, 1996, pp. 57-64.
J. Choi, J. Demmel, I. S. Dhillon, J. Dongarra, S. Ostrouchov, A. Petitet, K. Stanley, D. W. Walker, and R. C. Whaley, "ScaLAPACK, a portable linear algebra library for distributed memory computers -- Design issues and performance," in Applied Parallel Computing: Computations in Physics, Chemistry, and Engineering Science. Proc. 2nd Intl. Workshop (PARA '95), J. Dongarra, K. Madsen, and J. Wasniewski, Eds., Lecture Notes in Computer Science, Vol. 1041, Berlin, Germany: Springer-Verlag, 1996, pp. 95-106.
Z. Bai, D. Day, J. Demmel, M. Gu, J. Dongarra, A. Ruhe, and H. van der Vorst, "Templates for Linear Algebra Problems (Invited Paper)," in Computer Science Today: Recent Trends and Developments, J. van Leeuwen, Ed., Lecture Notes in Computer Science, Vol. 1000, Belin, Germany: Springer-Verlag, 1995, pp. 115-140.
J. Demmel, J. Dongarra, and W. M. Kahan, "On designing portable high performance numerical libraries," in Numerical Analysis 1991: Proc. 14th Dundee Conf. on Numerical Analysis, D. F. Griffiths and G. A. Watson, Eds., Pitman Research Notes in Mathematics, Essex, UK: Longman Scientific & Technical, 1992, pp. 69-84.
Y. You, Z. Zhang, C. Hsieh, J. Demmel, and K. Keutzer, "Fast deep neural network training on distributed systems and cloud TPUs," IEEE Transactions on Parallel and Distributed Systems, vol. 30, no. 11, pp. 2449--2462, Nov. 2019.
A. Azad, G. Ballard, A. Buluç, J. Demmel, L. Grigori, O. Schwartz, S. Toledo, and S. Williams, "Exploiting multiple levels of parallelism in sparse matrix-matrix multiplication," SIAM Journal on Scientific Computing, vol. 38, no. 6, pp. C624--C651, 2016.
K. Asanović, R. Bodik, J. Demmel, T. Keaveny, K. Keutzer, N. Morgan, D. A. Patterson, K. Sen, J. Wawrzynek, D. Wessel, and K. A. Yelick, "A View of the Parallel Computing Landscape," Communications of the ACM, vol. 52, no. 10, pp. 56-67, Oct. 2009.
J. Demmel, I. Dumitriu, O. Holtz, and R. Kleinberg, "Fast matrix multiplication is stable," Numerische Mathematik, vol. 106, no. 2, pp. 199-224, March 2007.
J. Demmel, J. Dongarra, V. Eijkhout, E. Fuentes, A. Petitet, R. Vuduc, R. C. Whaley, and K. A. Yelick, "Self-adapting linear algebra algorithms and software," Proc. IEEE, vol. 93, no. 2, pp. 293-312, Feb. 2005.
X. S. Li, J. Demmel, D. H. Bailey, G. Henry, Y. Hida, J. Iskandar, W. M. Kahan, S. Y. Kang, A. Kapur, M. C. Martin, B. J. Thompson, T. Tung, and D. J. Yoo, "Design, implementation and testing of Extended and Mixed Precision BLAS," ACM Trans. Mathematical Software, vol. 28, no. 2, pp. 152-205, June 2002.
L. S. Blackford, J. Demmel, J. Dongarra, I. Duff, S. Hammarling, G. Henry, M. Heroux, L. Kaufman, A. Lumsdaine, A. Petitet, R. Pozo, K. Remington, and R. C. Whaley, "An updated set of Basic Linear Algebra Subprograms (BLAS)," ACM Trans. Mathematical Software, vol. 28, no. 2, pp. 135-151, June 2002.
J. Demmel, S. C. Eisenstat, J. R. Gilbert, X. S. Li, and J. W. H. Liu, "A supernodal approach to sparse partial pivoting," SIAM J. Matrix Analysis and Applications, vol. 20, no. 3, pp. 720-7551, July 1999.
J. Saltz, A. Sussman, S. L. Graham, J. Demmel, S. Baden, and J. Dongarra, "Programming tools and environments," Communications of the ACM, vol. 41, no. 11, pp. 64-73, Nov. 1998.
S. Chakrabarti, J. Demmel, and K. A. Yelick, "Models and scheduling algorithms for mixed data and task parallel programs," J. Parallel and Distributed Computing: Special Issue on Dynamic Load Balancing, vol. 47, no. 1, pp. 168-184, Nov. 1997.
J. Dongarra and J. Demmel, "LAPACK: A portable high-performance numerical library for linear algebra," Supercomputer, vol. 8, no. 6, pp. 33-38, Nov. 1991.
Z. Bai and J. Demmel, "On a block implementation of Hessenberg multishift QR iteration," Intl. J. High Speed Computing, vol. 1, no. 1, pp. 97-112, May 1989.
J. Demmel, G. Lafferriere, J. Schwartz, and M. Sharir, "Theoretical and experimental studies using a planar multifinger manipulator," Naval Research Reviews, vol. 40, no. 3, pp. 14-23, 1988.
J. Demmel and B. Kagstrom, "Computing stable eigendecompositions of matrix pencils," Linear Algebra and Its Applications, vol. 88-89, pp. 139-186, April 1987.
Y. You, J. Li, S. Reddi, J. Hseu, S. Kumar, S. Bhojanapalli, X. Song, J. Demmel, K. Keutzer, and C. Hsieh, "Large batch optimization for deep learning: Training bert in 76 minutes," in International Conference on Learning Representations, 2020.
Y. You, Z. Zhang, C. Hsieh, J. Demmel, and K. Keutzer, "Imagenet training in minutes," in Proceedings of the 47th International Conference on Parallel Processing, 2018, pp. 1--10.
B. C. Catanzaro, S. A. Kamil, Y. Lee, K. Asanović, J. Demmel, K. Keutzer, J. Shalf, K. A. Yelick, and A. Fox, "SEJITS: Getting productivity and performance with selective embedded JIT specialization," in Proceedings First Workshop on Programming Models for Emerging Architectures, 2009.
V. Volkov and J. Demmel, "Benchmarking GPUs to tune dense linear algebra," in Proc. 2008 ACM/IEEE Conf. on Superconducting (SC '08), Piscataway, NJ: IEEE Press, 2008, pp. Art. 31:1-11.
L. Grigori, J. Demmel, and H. Xiang, "Communication avoiding Gaussian elimination," in Proc. 2008 ACM/IEEE Conf. on Superconducting (SC '08), Piscataway, NJ: IEEE Press, 2008, pp. Art. 29:1-12.
J. Demmel, M. Hoemmen, M. Hohiyuddin, and K. A. Yelick, "Avoiding communication in sparse matrix computations," in Proc. 22nd IEEE Intl. Parallel & Distributed Processing Symp. (IPDPS 2008), Piscataway, NJ: IEEE Press, 2008, pp. 12 pg.
D. Garmire, H. Choo, R. Kant, S. govindjee, C. H. Séquin, R. S. Muller, and J. Demmel, "Diamagnetically levitated MEMS accelerometers (Poster Paper)," in 14th IEEE Intl. Conf. on Solid-State Sensors, Actuators and Microsystems (TRANSDUCERS 2007) Digest of Technical Papers, Piscataway, NJ: IEEE Press, 2007, pp. 1203-1206.
S. Kim, S. Pakzad, D. E. Culler, J. Demmel, G. Fenves, S. Glaser, and M. Turon, "Health monitoring of civil infrastructures using wireless sensor networks," in Proc. 6th Intl. Symp. on Information Processing in Sensor Networks (IPSN 2007), New York, NY: The Association for Computing Machinery, Inc., 2007, pp. 254-263.
S. Kim, S. Pakzad, D. E. Culler, J. Demmel, G. Fenves, S. Glaser, and M. Turon, "Wireless sensor networks for structural health monitoring," in Proc. 4th Intl. Conf. on Embedded Networks Sensor Systems (SynSys '06), New York, NY: The Association for Computing Machinery, Inc., 2006, pp. 427-428.
J. Demmel, I. Dumitriu, and O. Holtz, "Toward accurate polynomial evaluation in rounded arithmetic (Short report)," in Algebraic and Numerical Algorithms and Computer-Assisted Proofs, B. Buchberger, S. Oishi, M. Plum, and S. M. Rump, Eds., Dagsthul Seminar Proceedings, Dagstuhl, Germany: IBFI, 2006, pp. 1-15.
T. Koyama, D. S. Bindel, W. He, E. P. Quevy, S. Govindjee, J. Demmel, and R. T. Howe, "Simulation tools for damping in high frequency resonators," in Proc. 4th IEEE Conf. on Sensors (SENSORS 2005), Piscataway, NJ: IEEE Press, 2005, pp. 349-352.
S. N. Pakzad, S. Kim, G. L. Fenves, S. D. Glaser, D. E. Culler, and J. Demmel, "Multi-purpose wireless accelerometers for civil infrastructure monitoring," in Structural Health Monitoring 2005: Proc. 5th Intl. Workshop (IWSHM 2005), F. K. Chang, Ed., Lancaster, PA: DEStech Publications, Inc., 2005, pp. 125-132.
D. S. Bindel, E. Quevy, T. Koyama, S. Govindjee, J. Demmel, and R. T. Howe, "Anchor loss simulation in resonators," in 18th IEEE Intl. Conf. on Micro Electro Mechanical Systems Technical Digest (MEMS 2005), Piscataway, NJ: IEEE Press, 2005, pp. 133-136.
J. V. Clark, D. Bindel, W. Kao, E. Zhu, A. Kuo, N. Zhou, J. Nie, J. Demmel, Z. Bai, S. Govindjee, K. Pister, M. Gu, and A. Agogino, "Addressing the needs of complex MEMS design," in Proc. 15th IEEE Intl. Conf. on Micro Electro Mechanical Systems, Piscataway, NJ: IEEE Press, 2002, pp. 204-209.
L. S. Blackford, J. Choi, A. Cleary, E. D'Azevedo, J. Demmel, I. Dhillon, J. Dongarra, S. Hammarling, G. Henry, A. Petitet, K. Stanley, D. Walker, and R. C. Whaley, "ScaLAPACK Users' Guide (325 pp., ISBN 0-89871-397-8)," 1997.
Technical Reports
R. Murray, J. Demmel, M. W. Mahoney, N. B. Erichson, M. Melnichenko, O. A. Malik, L. Grigori, P. Luszczek, M. Derezinski, M. E. Lopes, T. Liang, H. Luo, and J. Dongarra, "Randomized Numerical Linear Algebra: A Perspective on the Field With an Eye to Software," EECS Department, University of California, Berkeley, Tech. Rep. UCB/EECS-2023-19, Feb. 2023.
Y. You, Z. Zhang, C. Hsieh, J. Demmel, and K. Keutzer, "ImageNet Training in Minutes," EECS Department, University of California, Berkeley, Tech. Rep. UCB/EECS-2020-18, Jan. 2020.
Y. You, J. Demmel, K. Keutzer, C. Hsieh, C. Ying, and J. Hseu, "Large-Batch Training for LSTM and Beyond," EECS Department, University of California, Berkeley, Tech. Rep. UCB/EECS-2018-138, Nov. 2018.
E. Carson, J. Demmel, L. Grigori, N. Knight, P. Koanantakool, O. Schwartz, and H. V. Simhadri, "Write-Avoiding Algorithms," EECS Department, University of California, Berkeley, Tech. Rep. UCB/EECS-2015-163, June 2015.
G. Ballard, J. Demmel, L. Grigori, M. Jacquelin, H. D. Nguyen, and E. Solomonik, "Reconstructing Householder Vectors from Tall-Skinny QR," EECS Department, University of California, Berkeley, Tech. Rep. UCB/EECS-2013-175, Oct. 2013.
G. Ballard, D. Becker, J. Demmel, J. Dongarra, A. Druinsky, I. Peled, O. Schwartz, S. Toledo, and I. Yamazaki, "Communication-Avoiding Symmetric-Indefinite Factorization," EECS Department, University of California, Berkeley, Tech. Rep. UCB/EECS-2013-127, July 2013.
J. Demmel, A. Gearhart, O. Schwartz, and B. Lipshitz, "Perfect strong scaling using no additional energy," EECS Department, University of California, Berkeley, Tech. Rep. UCB/EECS-2012-126, May 2012.
M. Anderson, G. Ballard, J. Demmel, and K. Keutzer, "Communication-Avoiding QR Decomposition for GPUs," EECS Department, University of California, Berkeley, Tech. Rep. UCB/EECS-2010-131, Oct. 2010.
J. Demmel, M. F. Hoemmen, M. Mohiyuddin, and K. A. Yelick, "Avoiding Communication in Computing Krylov Subspaces," EECS Department, University of California, Berkeley, Tech. Rep. UCB/EECS-2007-123, Oct. 2007.
J. W. Demmel, Y. Hida, W. Kahan, X. S. Li, S. Mukherjee, and E. J. Riedy, "Error Bounds from Extra Precise Iterative Refinement," EECS Department, University of California, Berkeley, Tech. Rep. UCB/CSD-04-1344, March 2005.
J. Demmel and Y. Hida, "Accurate Floating Point Summation," EECS Department, University of California, Berkeley, Tech. Rep. UCB/CSD-02-1180, May 2002.
D. Bindel, J. Demmel, W. M. Kahan, and O. Marques, "On Computing Givens Rotations Reliably and Efficiently," University of Tennessee, Knoxville, Computer Science Department, Tech. Rep. UTK/CS-00-449, Oct. 2000.
X. S. Li, J. Demmel, D. H. Bailey, G. Henry, Y. Hida, J. Iskandar, W. M. Kahan, S. Y. Karg, A. Kapur, M. C. Martin, B. J. Thompson, T. Tung, and D. J. Yoo, "Design, Implementation and Testing of Extended and Mixed Precision BLAS," Lawrence Berkeley National Laboratory, Tech. Rep. LBNL-00-45991, June 2000.
J. W. Demmel, J. Gilbert, and X. S. Li, "SuperLU Users' Guide," EECS Department, University of California, Berkeley, Tech. Rep. UCB/CSD-97-944, May 1997.
J. W. Demmel, S. C. Eisenstat, J. R. Gilbert, X. S. Li, and J. W. Liu, "A Supernodal Approach to Sparse Partial Pivoting," EECS Department, University of California, Berkeley, Tech. Rep. UCB/CSD-95-883, July 1995.
J. W. Demmel, M. T. Heath, and H. A. van der Vorst, "Parallel Numerical Linear Algebra," EECS Department, University of California, Berkeley, Tech. Rep. UCB/CSD-92-703, Oct. 1992.
J. Demmel and Z. Bai, "LAPACK Working Note 38: On a Direct Algorithm for Computing Invariant Subspaces with Specified Eigenvalues," University of Tennessee, Knoxville, Computer Science Department, Tech. Rep. UT-CS-91-139, Aug. 1991.
J. Demmel, J. Dongarra, and W. M. Kahan, "LAPACK Working Note 39: On Designing Portable High Performance Numerical Libraries," University of Tennessee, Knoxville, Department of Computer Science, Tech. Rep. UTK/CS-91-141, Aug. 1991.
E. Anderson, C. Bischof, J. Demmel, J. Dongarra, J. DuCroz, S. Hammarling, and W. M. Kahan, "LAPACK Working Note 26: Prospectus for an Extension to LAPACK: A Portable Linear Algebra Library for High-Performance Computers," University of Tennessee, Knoxville, Department of Computer Science, Tech. Rep. UTK/CS-90-118, Nov. 1990.
E. Anderson, Z. Bai, C. Bischof, J. Demmel, J. Dongarra, J. Du Croz, A. Greenbaum, S. Hammarling, J. Langou, J. Langou, A. McKenney, S. Ostrouchov, and S. Sorenson, "LAPACK, Linear Algebra PACKage," 2006.
H. Choo, D. Garmire, R. S. Muller, and J. Demmel, "CMOS-compatible high-performance microscanners, including structures, high-yield simplified fabrication methods and applications," U.S. Patent Application. July 2006.
J. Demmel, W. M. Kahan, and B. N. Parlett, "Forsythe, Golub, and the Future of Matrix Computations," presented at Matrix Computations & Scientific Computing Seminar, 380 Soda Hall, March 2007.