| About Me | Education | Publications | Software | Courses | Industry Experience | Projects | Women in CS |
About Me
I am a PhD student in the Parallel Computing Lab (ParLab) at UC Berkeley working with Professor Kurt Keutzer. My research interests are in efficient parallel algorithm design and optimization on a variety of parallel platforms including GPUs, multi-core CPUs and commodity clusters. In particular, I work on finding efficient parallelization techniques for applications in audio content analysis including speech recognition and music information retrieval as well as web data mining and natural language processing. I am interested in developing a set of application frameworks to enable efficient and scalable parallel application development for application writers. Prior to UC Berkeley, I received a Bachelor of Science degree from the University of Illinois, Urbana-Champaign where I worked with Professor Laxmikant Kale in the Parallel Programming Lab (PPL).
Resume available upon request.Education
-
University of California, Berkeley
PhD, Computer Science, Expected: May 2013
Advisor: Kurt Keutzer
-
University of California, Berkeley
MS in Computer Science and Electrical Engineering, Received: December 2011
Advisor: Kurt Keutzer
Thesis: Fast Speaker Diarization Using a Specialization Framework for Gaussian Mixture Model Training
-
University of Illinois, Urbana-Champaign
BS, Computer Science, Received: May 2008
Advisor: Laxmikant Kale
Thesis: Parallel Performance Analysis of Parallel Message Driven Applications
Software
Ongoing project PyCASP-Python-based Content Analysis using SPecialization, aims to bring efficiency and portability to productive application development.Fast Gaussian Mixture Model (GMM) training using Python (described in our HotPar'11 paper) is available here. For documentation, installation instructions and examples please check the wiki.
Publications
-
Scalable Multimedia Content Analysis on Parallel Platforms
Ekaterina Gonina, Gerald Friedland, Eric Battenberg, Penporn Koanantakool, Michael Driscoll, Evangelos Georganas, Kurt Keutzer. to appear in ACM Transactions on Multimedia Computing, Communications and Applications (TOMCCAP) 2013. -
Portable Parallel Performance from Sequential, Productive, Embedded Domain Specific Languages
Shoaib Kamil, Derrick Coetzee, Scott Beamer, Henry Cook, Ekaterina Gonina, J. Harper, Jeffrey Morlan, Armando Fox.
ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming 2012 (PPoPP'12) (extended abstract) -
Fast Speaker Diarization Using a Specialization Framework for Gaussian Mixture Model Training
Ekaterina Gonina
Masters Thesis. EECS Department, University of California, Berkeley. Dec 12, 2011. -
Fast Speaker Diarization Using a High-Level Scripting Language
Ekaterina Gonina, Gerald Friedland, Henry Cook, Kurt Keutzer
In Proceedings of IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), Dec 11-15, 2011, Waikoloa, Hawaii. -
Parallelizing Large-Scale Data Processing Applications with Data Skew: A Case Study in Product-Offer Matching
Ekaterina Gonina, Anitha Kannan, John Shafer, Mihai Budiu
In Proceedings of the 2nd International Workshop on MapReduce and its Applications (MapReduce'11), June 8, 2011, San Jose, CA. -
CUDA-level Performance with Python-level Productivity for Gaussian Mixture Model Applications
Henry Cook, Ekaterina Gonina, Shoaib Kamil, Gerald Friedland, David Patterson, Armando Fox.
In Proceedings of the 3rd USENIX conference on Hot topics in parallelism (HotPar'11). USENIX Association, Berkeley, CA, USA. -
Considerations When Evaluating Microprocessor Platforms
Michael Anderson, Bryan Catanzaro, Jike Chong, Ekaterina Gonina, Kurt Keutzer, Chao-Yue Lai, Mark Murphy, David Sheffield, Bor-Yiing Su, Narayanan Sundaram
In Proceedings of the 3rd USENIX conference on Hot topics in parallelism (HotPar'11). USENIX Association, Berkeley, CA, USA. -
Efficient Automatic Speech Recognition on the GPU
Jike Chong, Ekaterina Gonina, Kurt Keutzer
Chapter in GPU Computing Gems Emerald Edition, Morgan Kaufmann, Vol. 1, February 9, 2011.
-
PALLAS: Mapping Applications onto Manycore
Michael Anderson, Bryan Catanzaro, Jike Chong, Ekaterina Gonina, Kurt Keutzer, Chai-Yue Lai, Matthew Moskewicz, Mark Murphy, Bor-Yiing Su, Narayanan Sundaram
In Multiprocessor System-on-Chip: Hardware Design and Tool Integration (Chapter 4), Springer, pages 89-114, December 2010 -
Exploring Recognition Network Representations for Efficient Speech Inference on Highly Parallel Platforms
Jike Chong, Ekaterina Gonina, Kisun You, Kurt Keutzer
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 1489-1492, Chiba, Japan, 26-30 September 2010.
-
Scalable Parallelization of Automatic Speech Recognition
Jike Chong, Ekaterina Gonina, Kisun You, Kurt Keutzer
Invited book chapter in Scaling Up Machine Learning book, Cambridge Press, 2010
-
Monte Carlo Methods
Jike Chong, Ekaterina Gonina, Kurt Keutzer
2nd Annual Conference on Parallel Programming Patterns (ParaPLoP'10), Carefree, AZ, March 30, 2010
-
Parallel Scalability in Speech Recognition: Inference Engine in Large Vocabulary Continuous Speech Recognition
Kisun You, Jike Chong, Youngmin Yi, Ekaterina Gonina, Christopher Hughes, Wonyong Sung and Kurt Keutzer
IEEE Signal Processing Magazine, vol. 26, no. 6, pp. 124-135, November 2009.
-
A Fully Data Parallel WFST-based Large Vocabulary Continuous Speech Recognition on a Graphics Processing Unit
Jike Chong, Ekaterina Gonina, Youngmin Yi, Kurt Keutzer
In Proceedings of the 10th Annual Conference of the International Speech Communication Association (InterSpeech), page 1183 – 1186, September, 2009. -
Scalable HMM based Inference Engine in Large Vocabulary Continuous Speech Recognition
Jike Chong, Kisun You, Youngmin Yi, Ekaterina Gonina, Christopher Hughes, Wonyong Sung, Kurt Keutzer
IEEE International Conference on Multimedia & Expo (ICME), page 1797-1800, July, 2009
-
Parallel Prim's Algorithm with a novel extension
Ekaterina Gonina and Laxmikant Kale
PPL Technical Report. October 2007.
2013
2012
2011
2010
2009
2007
Courses taken at UC Berkeley
Spring 2012| Number | Title | Instructor |
| EE225D | Audio Signal Processing in Humans and Machines | N. Morgan |
| STAT151B | Modern Statistical Prediction and Machine Learning | J. McAuliffe |
Fall 2011
| Number | Title | Instructor |
| CS298 | Acoustic Methods for Video Analysis | G. Friedland |
Fall 2010
| Number | Title | Instructor |
| CS294 | Productive Parallel Programming | David Patterson, Armando Fox |
Spring 2010
| Number | Title | Instructor |
| CS270 | Combinatorial Algorithms and Data Structures | Richard Karp |
| CS294 | Architecting Parallel Software using Patterns | Kurt Keutzer |
| CS298 | Readings in Spoken Language Processing | Nelson Morgan |
Fall 2009
| Number | Title | Instructor |
| CS281A/Stat241 | Statistical Learning Theory | Peter Bartlett |
Spring 2009
| Number | Title | Instructor |
| CS267 | Applications of Parallel Computing | James Demmel |
| CS294 | Special Topics - Patterns for Parallel Programming | Kurt Keutzer |
Fall 2008
| Number | Title | Instructor |
| CS262A | Advanced Topics in Computer Systems | Eric Brewer |
| CS280 | Computer Vision | Jitendra Malik |
Industry Experience
-
Google - Software Engineer Intern, Speech Recognition Team
May 2012 - August 2012
Developed a C++/CUDA library for dense linear algebra operations for training neural network models for speech recognition algorithms. -
Microsoft Research Silicon Valley - Research Intern, Search Labs
January 2011-April 2011
Parallelized web mining applications on a cluster of multi-core nodes using the DryadLINQ data-parallel framework for large-scale data processing. -
Microsoft Research Silicon Valley - Research Intern, Search Labs
June 2010-August 2010
Parallelized large-scale product-offer matching algorithm on a cluster of multi-core nodes using DryadLINQ. -
Intel Labs - Research Intern, Throughput Computing Labs
June 2009-August 2009
Parallelized finite-difference option pricing algorithms on Larrabee GPU and Nehalem multi-core CPU platforms. -
Amazon.com - SDE Intern - Customer Self-Service Team
June 2008 - August 2008
Developed a prototype for an Item Viewer web tool using Perl-Mason and Javascript for displaying customers recent order history. Item Viewer is now Amazon.com's self-sevice help tool to aid customers in searching order history online.
Projects
Productive Gaussian Mixture Model Training with Applications in Speaker Diarization UC Berkeley ParLabTypically, scientists with computational needs prefer to use high-level languages such as Python or MATLAB; however, large computationally-intensive problems must eventually be recoded in a low level language such as C or Fortran by expert programmers in order to achieve sufficient performance. In addition, multiple strategies may exist for mapping a problem onto parallel hardware depending on the input data size and the hardware parameters. We show how to preserve the productivity of high-level languages while obtaining the performance of the best low-level language code variant for a given hardware platform and problem size using SEJITS, a set of techniques that leverages just-in-time code generation and compilation. As a case study, we demonstrate our technique for Gaussian Mixture Model training using the EM algorithm. With the addition of one line of code to import our framework, a domain programmer using an existing Python GMM library can run her program unmodified on a GPU-equipped computer and achieve performance that meets or beats GPU code hand-crafted by a human expert. We also show that despite the overhead of allowing the domain expert's program to use Python and the overhead of just-in-time code generation and compilation, our approach still results in performance competitive with hand-crafted GPU code.
Related Documents:Machine Translation Decoder on Parallel Platforms UC Berkeley ParLab
Translation between natural languages is one of the most computationally challenging problems in the field of natural language processing. State-of-the-art machine translators utilize statistical knowledge of languages, in the form of language rules and translation rules, to guide the translation process bottom-up from single words to the whole sentence. The machine translation algorithm explores a discrete set of possible translations by traversing the language and translation models represented as graphs. Unlikely translations are pruned at every level to keep the set of translations manageable. Each translation rule evaluation at one level of the translation granularity is independent. During parallelization of a machine translation application, each thread can be mapped a translation rule evaluation. Each rule evaluation requires several language and translation model memory accesses. The number of these accesses and their exact pattern is dynamically determined by the input sentence and exact pruning mechanisms employed. We are currently working on efficiently parallelizing the machine translation algorithm on Nvidia GPUs.
Large-Scale Offer-Matching Algorithm on Clusters of Multicore Processors MSR Search Labs
Offer matching is one of the key applications in e-commerce search engines. The search engine receives tens of millions of textual product descriptions (offers) every day from various merchants. In order to display the offers in a web search page, they need to be matched to a catalog that contains structured data for products in various categories. The new set of product offers has to be published on a daily basis in order to keep the content of the search engine up-to-date. We developed a parallel version of the offer-matching algorithm to improve performance and enable matching of much larger data sets on clusters of multicore processors using DryadLINQ and .NET multithreading and achieved significant performance improvement.
Large Vocabulary Continuous Speech Recognition on Parallel Platforms UC Berkeley ParLab
Parallel scalability allows an application to efficiently utilize an increasing number of processing elements. In this work we explore a design space for application scalability for an inference engine in large vocabulary continuous speech recognition (LVCSR). Our implementation of the inference engine involves a parallel graph traversal through an irregular graph-based knowledge network with millions of states and arcs. The challenge is not only to define a software architecture that exposes sufficient fine-grained application concurrency, but also to efficiently synchronize between an increasing number of concurrent tasks and to effectively utilize the parallelism opportunities in today's highly parallel processors. We propose four application-level implementation alternatives we call "algorithm styles", and construct highly optimized implementations on two parallel platforms: an Intel Core i7 multicore processor and a NVIDIA GTX280 manycore processor. The highest performing algorithm style varies with the implementation platform. On 44 minutes of speech data set, we demonstrate substantial speedups of 3.4x on Core i7 and 10.5x on GTX280 compared to a highly optimized sequential implementation on Core i7 without sacrificing accuracy. The parallel implementations contain less than 2.5% sequential overhead, promising scalability and significant potential for further speedup on future platforms.
Related Documents:Parallel Crank-Nicolson Method for Option Pricing Intel Labs, UC Berkeley ParLab
Crank-Nicolson method is a more sophisticated finite-difference method for solving PDE's in an option pricing application in the domain of computational finance. We explored both the SIMD and the core-level parallelization techniques for the underlying computation on three parallel platforms: Intel Core i7 (Nehalem), Larrabee and Nvidia GTX 280. Linear scaling was achieved on all three platforms. However, while there was enough parallelism to efficiently utilize both Nehalem and Larrabee, similar implementation on the GTX280 did not see as well of a performance improvement. Algorithmic transformation is, therefore, required in order to obtain significant speedup and scaling numbers on the Nvidia GPU for this problem. This project was in collaboration with the Throughput Computing Lab at Intel, Santa Clara.
Related Documents:Parallel Prim's Algorithm on Large Clusters UIUC Parallel Programming Lab
This research project consists of parallel implementation and analysis of Prim's algorithm for finding a minimum spanning tree of a dense graph using MPI. The algorithm uses a novel extension of adding multiple vertices per iteration to achieve significant performance improvements on large problems (up to 200,000 vertices). Several experimental results are shown illustrating the advantages of the novel approach on large complete graphs on over a thousand processors. Some limiting factors are shown as well, such as the bound on the number of vertices that can be added per iteration depending on the problem size.
Related Documents:Women in CS
In the United States the number of women in engieering and technology fields has been steadily declining. In particular in Computer Science, there has been a dramatic decrease in women involvement (below 20%) in the last decade. With increasing importance and integration of technology in our society, it is cruicial to promote gender diversity in the technical fields. There are a number of outreach programs that aim to increase girls' interest in studying technical subjects across the country for girls K-12. Many universities have local student chapters promoting gender diversity and working on increasing the retention rate of women in studying computer science and engineering. I believe it is important to work toward a greater representation of women in computing by starting locally and getting involved in such programs. If such programs don't exist in your area, there are a number of nation-wide resources that can help change that.I have been involved in such organizations since 2005 as an undergrad at UIUC. I was technical director and treasurer for Women in Computer Science (WCS) student organization at UIUC. I was part of the ChicTech outreach program for high school girls interested in science and technology. At UC Berkeley, I was Co-President of WICSE - Women in Computer Science and Engineering student organization for the 2009-2010 academic year. WICSE is a networking and advocacy organization for graduate women in EECS department. Our organization puts strong emphasis on promoting gender diversity and increasing retention rates of women in technical fields by providing mentoring programs and participating in outreach programs such as Girls Go Tech and Expand Your Horizons.
Resources:
- Grace Hopper Conference for Women in Computer Science
- Anita Borg Institute for Women in Computing
- ACM-W - ACM's Committee on Women in Computing
- CRA-W - Committee on the Status of Women in Computing Research