# Nonparametric Bayesian Models for Machine Learning

### Romain Jean Thibaux

###
EECS Department

University of California, Berkeley

Technical Report No. UCB/EECS-2008-130

October 14, 2008

### http://www.eecs.berkeley.edu/Pubs/TechRpts/2008/EECS-2008-130.pdf

This thesis presents general techiques for inference in various nonparametric Bayesian models, furthers our understanding of the stochastic processes at the core of these models, and develops new models of data based on these findings. In particular, we develop new Monte Carlo algorithms for Dirichlet process mixtures based on a general framework. We extend the vocabulary of processes used for nonparametric Bayesian models by proving many properties of beta and gamma processes. In particular, we show how to perform probabilistic inference in hierarchies of beta and gamma processes, and how this naturally leads to improvements to the well known na\"{i}ve Bayes algorithm. We demonstrate the robustness and speed of the resulting methods by applying it to a classification task with 1 million training samples and 40,000 classes.

**Advisor:** Michael Jordan

BibTeX citation:

@phdthesis{Thibaux:EECS-2008-130, Author = {Thibaux, Romain Jean}, Title = {Nonparametric Bayesian Models for Machine Learning}, School = {EECS Department, University of California, Berkeley}, Year = {2008}, Month = {Oct}, URL = {http://www.eecs.berkeley.edu/Pubs/TechRpts/2008/EECS-2008-130.html}, Number = {UCB/EECS-2008-130}, Abstract = {This thesis presents general techiques for inference in various nonparametric Bayesian models, furthers our understanding of the stochastic processes at the core of these models, and develops new models of data based on these findings. In particular, we develop new Monte Carlo algorithms for Dirichlet process mixtures based on a general framework. We extend the vocabulary of processes used for nonparametric Bayesian models by proving many properties of beta and gamma processes. In particular, we show how to perform probabilistic inference in hierarchies of beta and gamma processes, and how this naturally leads to improvements to the well known na\"{i}ve Bayes algorithm. We demonstrate the robustness and speed of the resulting methods by applying it to a classification task with 1 million training samples and 40,000 classes.} }

EndNote citation:

%0 Thesis %A Thibaux, Romain Jean %T Nonparametric Bayesian Models for Machine Learning %I EECS Department, University of California, Berkeley %D 2008 %8 October 14 %@ UCB/EECS-2008-130 %U http://www.eecs.berkeley.edu/Pubs/TechRpts/2008/EECS-2008-130.html %F Thibaux:EECS-2008-130