Electrical Engineering
      and Computer Sciences

Electrical Engineering and Computer Sciences


UC Berkeley


2009 Research Summary

Statistical Analysis of Estimators in Machine Learning

View Current Project Information

Percy Shuo Liang and Michael Jordan

NDSEG Fellowship and National Science Foundation 0509559

Probabilistic models play a prominent role in domains such as natural language processing, bioinformatics, and computer vision, where they provide methods for jointly reasoning about many interdependent variables. For prediction tasks, one generally models a conditional distribution over outputs given an input. There can be reasons, however, for pursuing alternatives to conditional modeling. First, we might be able to leverage additional statistical strength present in the input by using generative methods rather than discriminative ones. Second, the exact inference required for a full conditional likelihood could be intractable; in this case, one might turn to computationally more efficient alternatives such as pseudolikelihood.

We present a unified framework for studying these estimators, which allows us to compare their relative (statistical) efficiencies. Our asymptotic analysis suggests that modeling more of the data tends to reduce variance, but at the cost of being more sensitive to model misspecification. We are currently extending our asymptotic analysis tools to study general families of regularizers.

P. Liang and M. I. Jordan, "An Asymptotic Analysis of Generative, Discriminative, and Pseudolikelihood Estimators," ICML, 2008.