I am a PhD student at UC Berkeley in the AMPLab and the Database Group, advised by Michael Franklin. Prior to Berkeley, I had three short engineering stints at Google, IBM, and Altera. I enjoy traveling, playing badminton and squash.
Projects
Below is a list of projects that I actively contribute to. Mostly open sourced under BSD or Apache 2 license.
Shark: An open source SQL analytics system that marries query processing with complex analytics (e.g. machine learning) on large clusters. It uses Spark as the physical execution engine and can run Hive QL queries up to 100x faster without losing the fault-tolerance and scale-out properties of MapReduce.
GraphX: A distributed graph computation engine built on top of Spark that can significantly simplify graph computation programming. Its concise APIs enable users to express graph algorithms such as PageRank in 5 lines of code. It supports both interactive graph mining and efficient graph computations in a single runtime.
Spark: An open source cluster computing engine that makes data analytics fast — both fast to run and fast to write. It provides an efficient abstraction for distributed in-memory computation and can run 100x faster than Hadoop for data-intensive applications. Due to my work on Shark and GraphX, I am a primary contributor to Spark.
CrowdDB: A pioneering database system that incorporates crowd-sourced query processing. The project presents a vision in which humans are simply resources database systems can use to answer queries.
Readings in Databases: I maintain a list of papers essential to the understanding of database systems online.
Recent Publications
- Shark: SQL and Rich Analytics at Scale. R. Xin, J. Rosen, M. Zaharia, M. J. Franklin, S. Shenker, I. Stoica. SIGMOD 2013.
- The Case for Tiny Tasks in Compute Clusters. K. Ousterhout, A. Panda, J. Rosen, S. Venkataraman, R. Xin, S. Ratnasamy, S. Shenker, I. Stoica. HotOS 2013.
- Finding Related Tables. A. Das Sarma, L. Fang, N. Gupta, A. Halevy, H. Lee, F. Wu, R. Xin, C. Yu. SIGMOD Industrial Track 2012
- Shark: Fast Data Analysis Using Coarse-grained Distributed Memory: C. Engle, A. Lupher, R .Xin, M. Zaharia, M. J. Franklin, S. Shenker, I. Stoica. SIGMOD 2012 Best Demo Award.
- CrowdDB: Query Processing with the VLDB Crowd. M. J. Franklin, D. Kossmann, T. Kraska, S. Madden, S. Ramesh, R. Xin. VLDB 2011 Best Demo Award.
- CrowdDB: Answering Queries with Crowdsourcing. M. J. Franklin, D. Kossmann, T. Kraska, S. Ramesh, R. Xin. SIGMOD 2011.
- MEET DB2: Automated Database Migration Evaluation. R. Xin, P. Dantressangle, S. Lightstone, W. McLaren, S. Schormann, and M. Schwenger. VLDB 2010 Industrial Track.
Talks
- Lightning-fast data analytics using the Berkeley Data Analytics Stack (BDAS)
- Hadoop Summit, San Jose, June 2013
- Shark: SQL and Rich Analytics at Scale
- Lawrence Livermore National Laboratory (LLNL), July 2013
- SIGMOD, New York, June 2013
- Oracle, Redwood City, May 2013
- The Spark Stack: Fast and Expressive Big Data Analytics in Scala
- Scala Days, New York, June 2013
- Making Big Data Analytics Interactive and Real-time
- Big Data Analytics, Microsoft Research, Cambridge, May 2013
- Spark and Shark: High-speed In-memory Analytics over Hadoop Data
- Huawei, Shenzhen, April 2013
- Intel, Shanghai, March 2013
- Intel, Beijing, March 2013
- Tsinghua University, Beijing, March 2013
- Yahoo, Sunnyvale, Oct 2012
- AMPCamp (Big Data bootcamp for practitioners)
- ECNU, Shanghai, March 2013
- Strata Conference, Santa Clara, February 2013
- Berkeley, Aug 2012
- Introduction to Shark: Hive on Spark
- Apple, Cupertino, February 2013
- IBM TJ Watson Research Center, July 2012
- LinkedIn, Mountain View, June 2012
- Mozilla, Mountain View, May 2012
- SIGMOD (demo), Scottsdale, May 2012
- Foursquare, San Francisco, May 2012
- Spark User Meetup at Palantir, Palo Alto, April 2012
- Facebook, Menlo Park, February 2012