I am a Computer Science PhD student at AMPLab, EECS, UC Berkeley, advised by Michael I. Jordan. My research interests encompasses Machine Learning and Big Data problems, including designing scalable machine learning algorithms for deployment in large-scale systems. Specifically, my current research focuses on adapting database concepts on concurrency control to parallelizing inherently sequential machine learning algorithms, in order to maximize scalability while preserving correctness and theoretical guarantees.
Prior to my PhD studies, I worked as a research scientist at DSO National Laboratories, Singapore. As part of a collaboration with the Future Urban Mobility project at SMART (Singapore-MIT Alliance for Science and Technology), I worked with Javed Aslam and Daniela Rus on mining travel patterns using data collected from a roving network sensor of taxi probes.
I obtained my BS and MS in Computer Science at Carnegie Mellon University, where I was advised by Priya Narasimhan. As part of my thesis work, I developed a framework for localizing and diagnosing faulty nodes in a MapReduce cluster, based on OS-level performance counters, white-box metrics extracted from logs, and on application-level heartbeats. The fault diagnosis framework was able to capture a variety of faults including resource hogs and application hangs, and to localize the fault to subsets of worker nodes in a Hadoop system.[ Short Biography | CV ]
Our paper on "Parallel Double Greedy Submodular Maximization" has been accepted at NIPS 2014, Montreal, Quebec, Canada.
We will be making the paper and code available on this website soon.
We're organizing the workshop on Big Learning: Advances in Algorithms and Data Management to be held at NIPS, Lake Tahoe, NV, on December 9 or 10, 2013. This year, the workshop aims to bring together the Large-scale Machine Learning and Database Systems communities to facilitate the cross-pollination of ideas.