Joseph E. Gonzalez
A picture of Joseph Gonzalez in Seattle.

Postdoc, UC Berkeley AMPLab

CV / Research Statement
Email Address:

I am excited to announce that I will be joining the faculty at UC Berkeley as an Assistant Professor in EECS starting this January 2016!


I am an assistant professor at UC Berkeley where I am continuing work on large-scale systems for machine learning as well as the GraphLab project. I am also a founder of Dato Inc. which was started based on my thesis research. As a graduate student I worked with Carlos Guestrin in the Machine Learning Department at Carnegie Mellon University (CMU). My research addresses the challenges of designing and building large-scale machine learning algorithms and systems. In particular, my thesis focuses on large-scale structured machine learning using probabilistic graphical models that are capable of reasoning about billions of related random variables. The resulting algorithms and systems have achieved state-of-the-art performance in tasks ranging from predicting ad preferences in social networks to solving complex protein modeling tasks. As part of my thesis work we created GraphLab, a framework that dramatically simplifies the design and implementation of high-performance large-scale machine learning systems.

I am a recipient of the ATT Labs Graduate Research Fellowship and the National Science Foundation Graduate Research Fellowship. Some of my work is also supported by the ONR Young Investigator Program grant N00014-08-1-0752, the ARO under MURI W911NF0810242, and the ONR PECASE-N00014-10-1-0672.

I completed my BS in computer science at the California Institute of Technology (Caltech) and my MS in Machine Learning at Carnegie Mellon University. As part of my Masters work I developed non-parametric Bayesian models to estimate wireless signal quality in sensor networks.


  • I am presenting the 2014 ICML Tutorial on Emerging Systems for Large-Scale Machine Learning. A draft of the talk is available in pptx with animations and pdf forms.
  • I completed my thesis defense! Checkout the heavily illustrated/animated presentation .
  • I am co-organizing the NIPS’12 Big Learning Workshop.
  • I am co-organizing the NIPS’11 Big Learning Workshop.
  • Checkout our new parallel machine learning framework: GraphLab.
  • We protested important machine learning issues at the G20 in Pittsburgh. To learn more about how you can Support Vector Machines checkout out our entertaining pictures.


  • Neeraja J. Yadwadkar, Bharath Hariharan, Joseph Gonzalez and Randy Katz (2015). "Faster Jobs in Distributed Data Processing using Multi-Task Learning" Conference: SIAM International Conference on Data Mining (SDM15). [Paper]
  • Dan Crankshaw, Peter Bailis, Joseph Gonzalez, Haoyuan Li, Zhao Zhang, Michael Franklin, Ali Ghodsi, and Michael Jordan (2015). "The missing piece in complex analytics: Low latency, scalable model management and serving with Velox." Conference: Conference on Innovative Data Systems Research (CIDR). [Paper]
  • Xinghao Pan, Stefanie Jegelka, Joseph E. Gonzalez, Joseph K. Bradley, and Michael I. Jordan (2014). "Parallel double greedy submodular maximization." Advances in Neural Information Processing Systems (NIPS). [Paper] [code]
  • Joseph E. Gonzalez, Reynold S. Xin, Ankur Dave, Daniel Crankshaw, Michael J. Franklin, Ion Stoica (2014). "GraphX: Graph Processing in a Distributed Dataflow Framework." Proceedings of Operating Systems Design and Implementation (OSDI). [Paper]
  • Xinghao Pan, Joseph E. Gonzalez, Stefanie Jegelka, Tamara Broderick, and Michael I. Jordan (2013). "Optimistic concurrency control for distributed unsupervised learning.." Advances in Neural Information Processing Systems (NIPS) 26, 2013.. [Paper]
  • Evan Sparks, Ameet Talwalkar, Virginia Smith, Xinghao Pan, Joseph Gonzalez, Tim Kraska, Michael I. Jordan, and Michael J. Franklin (2013). "MLI: An API for distributed machine learning.." IEEE International Conference on Data Mining (ICDM).. [Paper]
  • Reynold Xin, Joseph Gonzalez, Michael Franklin, Ion Stoica (2013). "GraphX: A Resilient Distributed Graph System on Spark.." SIGMOD 2013 GRADES Workshop.. [Paper]
  • Joseph Gonzalez, Yucheng Low, Haijie Gu, Danny Bickson, Carlos Guestrin (2012). "PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs." Proceedings of Operating Systems Design and Implementation (OSDI). [GraphLab2 (PowerGraph)] [abs/bib] [pdf]
  • Yucheng Low, Joseph Gonzalez, Aapo Kyrola, Danny Bickson, Carlos Guestrin and Joseph M. Hellerstein (2012). "Distributed GraphLab: A Framework for Machine Learning and Data Mining in the Cloud." Proceedings of Very Large Data Bases (PVLDB). [code release] [abs/bib] [pdf]
  • Amr Ahmed, Mohamed Aly, Joseph Gonzalez, Shravan Narayanamurthy, Alex Smola (2012). "Scalable Inference in Latent Variable Models." Conference on Web Search and Data Mining (WSDM). [bibtex] [pdf]
  • Joseph Gonzalez, Yucheng Low, Arthur Gretton, Carlos Guestrin (2011). "Parallel Gibbs Sampling: From Colored Fields to Thin Junction Trees." In Artificial Intelligence and Statistics (AISTATS). [code release] [abs/bib] [pdf] [pptx]
  • Yucheng Low, Joseph Gonzalez, Aapo Kyrola, Danny Bickson, Carlos Guestrin, Joseph M. Hellerstein (2010). "GraphLab: A New Parallel Framework for Machine Learning." Conference on Uncertainty in Artificial Intelligence (UAI). [code release] [abs/bib] [pdf]
  • Joseph Gonzalez, Yucheng Low, Carlos Guestrin (2010). "Parallel Inference on Large Factor Graphs." Book chapter in Scalable MachineLearning.
  • Joseph Gonzalez, Yucheng Low, Carlos Guestrin, David O`Hallaron (2009). "Distributed Parallel Inference on Large Factor Graphs." Conference on Uncertainty in Artificial Intelligence (UAI). [abs/bib] [pdf] [pptx]
  • Joseph Gonzalez, Yucheng Low, and Carlos Guestrin (2009). "Residual Splash for Optimally Parallelizing Belief Propagation." In Artificial Intelligence and Statistics (AISTATS). [abs/bib] [pdf] [pptx]


  • Invited Talk: Optimistic Concurrency Control in the Design and Analysis of Parallel Learning Algorithms. [pptx with animations, pdf] Information Theory and Applications (ITA) Workshop. 2015
  • The Missing Piece in Complex Analytics: Scalable, Low Latency Model Serving and Management with Velox. [keynote with animations, pdf] Conference on Innovative Database Research (CIDR'15).
  • GraphX: Graph Processing in a Distributed Dataflow Framework. [pptx with animations, pdf] Proceedings of Operating Systems Design and Implementation (OSDI'14)
  • Invited Talk: "Concurrency Control For Scalable Bayesian Inference" [pptx with animations, pdf] Annual meeting of the International Society for Bayesian Analysis (ISBA) 2014
  • ICML Tutorial: "Emerging Systems for Large-Scale Machine Learning" [pptx with animations, pdf] International Conference for Machine Learning (ICML) 2014
  • Invited Talk: "GraphX: Unifying Table and Graph Analytics" [pptx with animations, pdf] Session on Graph Algorithms Building Blocks at the International Parallel and Distributed Processing Systems (IPDPS) 2014
  • Keynote Speaker: "From Graphs to Tables: The Design of Scalable Systems for Graph Analytics." [pptx with animations, pdf] Workshop on Big Graph Mining at the International World Wide Web Conference (WWW) 2014.
  • Guest Lecture: "Linear Regression and the Bias Variance Tradeoff" [with animations, pdf] Berkeley class on Statistical Learning Theory
  • Thesis Defense Talk: "Parallel and Distributed Systems for Probabilistic Reasoning" [PPTX] 11/26/2012
  • OSDI Talk on PowerGraph (GraphLab2) "PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs" [PPTX] [Extended PPTX] 10/7/2012
  • Guest Lecture for the Berkeley class Analyzing Big Data with Twitter. "Big Learning with Graphs" [PPTX + Video] 10/2/2012
  • AMPLab retreat talk on GraphLab2 "GraphLab2: A distributed framework for graph-parallel big-learning on natural graphs." [PPTX] 5/18/2012
  • Class lecture on Big Learning with Graphs presented to the CMU class "Machine Learning with Large Datasets" [PPTX] 3/8/2012
  • GraphLab2 Talk at CMU [PPTX] 10/18/2011
  • GraphLab talk at the IDGA Data Center Conslidation Summit. [PPTX] 10/3/2011
  • Early GraphLab2 Talk at Yahoo! Research. [PPTX] 9/9/2011
  • GraphLab talk at Berkeley. [PPTX] 9/7/2011
  • GraphLab talk at Greenplum EMC. [PPTX] 8/24/2011
  • GraphLab talk at LinkedIn. [PPTX] 8/2/2011
  • GraphLab talk at Cloudera. [PPTX] 7/29/2011
  • GraphLab talk at Facebook. [PPTX] 7/19/2011
  • Thesis Proposal [PPTX]
  • Splash Gibbs Sampling. [PPTX] AISTATS 2011
  • Parallel Belief Propagation [PPTX] ML Lunch 2009
  • Splash Belief Propagation [PPTX] UAI 2009