Natural Language Processing Group (NLP)
Daniel Klein, Alexandre Bouchard-Cote, John Sturdy DeNero, Aria Delier Haghighi, Percy Shuo Liang, Adam David Pauls, Slav Orlinov Petrov and David Burkett
National Science Foundation, Microsoft, MICRO and Defense Advanced Research Projects Agency
The Berkeley Natural Language Processing Group builds systems which analyze and understand human language data. We combine statistical modeling techniques with linguistically rich structural representations to efficiently deal with the complexities and ambiguities of natural language. A particular focus of our group is on building systems that maximize elegance, accuracy, and efficiency.
Our research covers a range of topics in natural language processing. Broadly, we work on the following areas:
- Linguistic analysis: disambiguating the syntactic and semantic structures of text. Our work in this area includes syntactic parsing, semantic role labeling, and coreference. Some highlights: we currently have an extremely fast parser which is currently the best parser in several languages. Current projects include Parsing, Historical Linguistics, and Unsupervised Coreference;
- Machine translation: mapping text in one language into another. Our current work in MT is focused on syntax-based translation and discriminative learning methods. We have also studied word alignment (translation lexicon learning) heavily and have built several of the best word alignment systems. Current projects include Machine Translation and Word Alignment;
- Speech recognition: transcribing audio into text. We are applying the learning methods from our successful parser to the speech recognition problem; and
- Unsupervised learning: detecting and inducing hidden structure. Humans learn language without supervision, can machines? We have demonstrated that a range of linguistic structure, including grammar, coreference, word classes, and translation lexica can be effectively learned in an unsupervised fashion. Current projects include Parsing, Historical Linguistics, and Unsupervised Coreference.