Reynold S. Xin

I am a PhD student at UC Berkeley in the AMPLab and the Database Group, advised by Michael Franklin. Prior to Berkeley, I had three short engineering stints at Google, IBM, and Altera. I enjoy traveling, playing badminton and squash.

Projects

Below is a list of projects that I actively contribute to. Mostly open sourced under BSD or Apache 2 license.

Shark: An open source SQL analytics system that marries query processing with complex analytics (e.g. machine learning) on large clusters. It uses Spark as the physical execution engine and can run Hive QL queries up to 100x faster without losing the fault-tolerance and scale-out properties of MapReduce.

GraphX: A distributed graph computation engine built on top of Spark that can significantly simplify graph computation programming. Its concise APIs enable users to express graph algorithms such as PageRank in 5 lines of code. It supports both interactive graph mining and efficient graph computations in a single runtime.

Spark: An open source cluster computing engine that makes data analytics fast — both fast to run and fast to write. It provides an efficient abstraction for distributed in-memory computation and can run 100x faster than Hadoop for data-intensive applications. Due to my work on Shark and GraphX, I am a primary contributor to Spark.

CrowdDB: A pioneering database system that incorporates crowd-sourced query processing. The project presents a vision in which humans are simply resources database systems can use to answer queries.

Readings in Databases: I maintain a list of papers essential to the understanding of database systems online.

Recent Publications

Talks