Learning with imperfect data

Mehryar Mohri
Courant Institute of Mathematical Sciences

Abstract

Earlier learning theory and algorithms were developed for an ideal world. Modern large-scale data sets and applications bring forth problems that must be addressed for learning to be effective, e.g., training points are often poorly labeled, the sample can be biased, the distributions may drift with time, and the sample points may not be i.i.d.

This talk will address the specific problem of domain adaptation which arises when the distribution of the source labeled data somewhat differs from that of the target domain. It will present novel theoretical results for adaptation and provide algorithmic solutions derived from that theory. It will also report some preliminary experimental results.

Joint work with Yishay Mansour and Afshin Rostamizadeh.

Mehryar Mohri is a Professor of Computer Science at the Courant Institute of Mathematical Sciences in NY. He has done his undergraduate studies at Ecole Polytechnique and his graduate and Ph.D. studies in math and computer science in Paris at Ecole Normale Superieure d'Ulm and University Paris 7 - Denis Diderot.

Mohri worked for about ten years at AT&T Research, formerly AT&T Bell Labs (1995-2004), where, in the last four years, he served as the Head of the Speech Algorithms Department and as a Technology Leader, overseeing research projects in machine learning, text and speech processing, and the design of general algorithms.

Dr. Mohri is also a Research consultant at Google Research. His current topics of interest are machine learning, theory and algorithms, text and speech processing, and computational biology.