About Me Education Publications Software Courses Industry Experience Projects Women in CS

About Me

I am a PhD student in the Parallel Computing Lab (ParLab) at UC Berkeley working with Professor Kurt Keutzer. My research interests are in efficient parallel algorithm design and optimization on a variety of parallel platforms including GPUs, multi-core CPUs and commodity clusters. In particular, I work on finding efficient parallelization techniques for applications in audio content analysis including speech recognition and music information retrieval as well as web data mining and natural language processing. I am interested in developing a set of application frameworks to enable efficient and scalable parallel application development for application writers. Prior to UC Berkeley, I received a Bachelor of Science degree from the University of Illinois, Urbana-Champaign where I worked with Professor Laxmikant Kale in the Parallel Programming Lab (PPL).

Resume available upon request.



Ongoing project PyCASP-Python-based Content Analysis using SPecialization, aims to bring efficiency and portability to productive application development.

Fast Gaussian Mixture Model (GMM) training using Python (described in our HotPar'11 paper) is available here. For documentation, installation instructions and examples please check the wiki.


Courses taken at UC Berkeley

Spring 2012
EE225DAudio Signal Processing in Humans and MachinesN. Morgan
STAT151BModern Statistical Prediction and Machine LearningJ. McAuliffe

Fall 2011
CS298Acoustic Methods for Video AnalysisG. Friedland

Fall 2010
CS294Productive Parallel ProgrammingDavid Patterson, Armando Fox

Spring 2010
CS270Combinatorial Algorithms and Data StructuresRichard Karp
CS294 Architecting Parallel Software using PatternsKurt Keutzer
CS298 Readings in Spoken Language ProcessingNelson Morgan

Fall 2009
CS281A/Stat241 Statistical Learning TheoryPeter Bartlett

Spring 2009
CS267Applications of Parallel ComputingJames Demmel
CS294 Special Topics - Patterns for Parallel Programming Kurt Keutzer

Fall 2008
CS262AAdvanced Topics in Computer SystemsEric Brewer
CS280 Computer VisionJitendra Malik

Industry Experience


Productive Gaussian Mixture Model Training with Applications in Speaker Diarization UC Berkeley ParLab

Typically, scientists with computational needs prefer to use high-level languages such as Python or MATLAB; however, large computationally-intensive problems must eventually be recoded in a low level language such as C or Fortran by expert programmers in order to achieve sufficient performance. In addition, multiple strategies may exist for mapping a problem onto parallel hardware depending on the input data size and the hardware parameters. We show how to preserve the productivity of high-level languages while obtaining the performance of the best low-level language code variant for a given hardware platform and problem size using SEJITS, a set of techniques that leverages just-in-time code generation and compilation. As a case study, we demonstrate our technique for Gaussian Mixture Model training using the EM algorithm. With the addition of one line of code to import our framework, a domain programmer using an existing Python GMM library can run her program unmodified on a GPU-equipped computer and achieve performance that meets or beats GPU code hand-crafted by a human expert. We also show that despite the overhead of allowing the domain expert's program to use Python and the overhead of just-in-time code generation and compilation, our approach still results in performance competitive with hand-crafted GPU code.

Related Documents:

Machine Translation Decoder on Parallel Platforms UC Berkeley ParLab

Translation between natural languages is one of the most computationally challenging problems in the field of natural language processing. State-of-the-art machine translators utilize statistical knowledge of languages, in the form of language rules and translation rules, to guide the translation process bottom-up from single words to the whole sentence. The machine translation algorithm explores a discrete set of possible translations by traversing the language and translation models represented as graphs. Unlikely translations are pruned at every level to keep the set of translations manageable. Each translation rule evaluation at one level of the translation granularity is independent. During parallelization of a machine translation application, each thread can be mapped a translation rule evaluation. Each rule evaluation requires several language and translation model memory accesses. The number of these accesses and their exact pattern is dynamically determined by the input sentence and exact pruning mechanisms employed. We are currently working on efficiently parallelizing the machine translation algorithm on Nvidia GPUs.

Large-Scale Offer-Matching Algorithm on Clusters of Multicore Processors MSR Search Labs

Offer matching is one of the key applications in e-commerce search engines. The search engine receives tens of millions of textual product descriptions (offers) every day from various merchants. In order to display the offers in a web search page, they need to be matched to a catalog that contains structured data for products in various categories. The new set of product offers has to be published on a daily basis in order to keep the content of the search engine up-to-date. We developed a parallel version of the offer-matching algorithm to improve performance and enable matching of much larger data sets on clusters of multicore processors using DryadLINQ and .NET multithreading and achieved significant performance improvement.

Large Vocabulary Continuous Speech Recognition on Parallel Platforms UC Berkeley ParLab

Parallel scalability allows an application to efficiently utilize an increasing number of processing elements. In this work we explore a design space for application scalability for an inference engine in large vocabulary continuous speech recognition (LVCSR). Our implementation of the inference engine involves a parallel graph traversal through an irregular graph-based knowledge network with millions of states and arcs. The challenge is not only to define a software architecture that exposes sufficient fine-grained application concurrency, but also to efficiently synchronize between an increasing number of concurrent tasks and to effectively utilize the parallelism opportunities in today's highly parallel processors. We propose four application-level implementation alternatives we call "algorithm styles", and construct highly optimized implementations on two parallel platforms: an Intel Core i7 multicore processor and a NVIDIA GTX280 manycore processor. The highest performing algorithm style varies with the implementation platform. On 44 minutes of speech data set, we demonstrate substantial speedups of 3.4x on Core i7 and 10.5x on GTX280 compared to a highly optimized sequential implementation on Core i7 without sacrificing accuracy. The parallel implementations contain less than 2.5% sequential overhead, promising scalability and significant potential for further speedup on future platforms.

Related Documents:

Parallel Crank-Nicolson Method for Option Pricing Intel Labs, UC Berkeley ParLab

Crank-Nicolson method is a more sophisticated finite-difference method for solving PDE's in an option pricing application in the domain of computational finance. We explored both the SIMD and the core-level parallelization techniques for the underlying computation on three parallel platforms: Intel Core i7 (Nehalem), Larrabee and Nvidia GTX 280. Linear scaling was achieved on all three platforms. However, while there was enough parallelism to efficiently utilize both Nehalem and Larrabee, similar implementation on the GTX280 did not see as well of a performance improvement. Algorithmic transformation is, therefore, required in order to obtain significant speedup and scaling numbers on the Nvidia GPU for this problem. This project was in collaboration with the Throughput Computing Lab at Intel, Santa Clara.

Related Documents:

Parallel Prim's Algorithm on Large Clusters UIUC Parallel Programming Lab

This research project consists of parallel implementation and analysis of Prim's algorithm for finding a minimum spanning tree of a dense graph using MPI. The algorithm uses a novel extension of adding multiple vertices per iteration to achieve significant performance improvements on large problems (up to 200,000 vertices). Several experimental results are shown illustrating the advantages of the novel approach on large complete graphs on over a thousand processors. Some limiting factors are shown as well, such as the bound on the number of vertices that can be added per iteration depending on the problem size.

Related Documents:

Women in CS

In the United States the number of women in engieering and technology fields has been steadily declining. In particular in Computer Science, there has been a dramatic decrease in women involvement (below 20%) in the last decade. With increasing importance and integration of technology in our society, it is cruicial to promote gender diversity in the technical fields. There are a number of outreach programs that aim to increase girls' interest in studying technical subjects across the country for girls K-12. Many universities have local student chapters promoting gender diversity and working on increasing the retention rate of women in studying computer science and engineering. I believe it is important to work toward a greater representation of women in computing by starting locally and getting involved in such programs. If such programs don't exist in your area, there are a number of nation-wide resources that can help change that.

I have been involved in such organizations since 2005 as an undergrad at UIUC. I was technical director and treasurer for Women in Computer Science (WCS) student organization at UIUC. I was part of the ChicTech outreach program for high school girls interested in science and technology. At UC Berkeley, I was Co-President of WICSE - Women in Computer Science and Engineering student organization for the 2009-2010 academic year. WICSE is a networking and advocacy organization for graduate women in EECS department. Our organization puts strong emphasis on promoting gender diversity and increasing retention rates of women in technical fields by providing mentoring programs and participating in outreach programs such as Girls Go Tech and Expand Your Horizons.