Electrical Engineering
      and Computer Sciences

Electrical Engineering and Computer Sciences

COLLEGE OF ENGINEERING

UC Berkeley

   

2008 Research Summary

A Permutation-Augmented Sampler for Dirichlet Process Mixture Models

View Current Project Information

Percy Shuo Liang, Michael Jordan and Ben Taskar

Dirichlet process (DP) mixture models have been usefully employed as a clustering methodology in a variety of applied areas such as bioinformatics, vision, and topic modeling. By treating the number of mixture components as random, DP mixtures provide an appealing nonparametric approach to mixture modeling in which the complexity of the model adapts to the complexity inherent in the data.

We introduce a new inference algorithm for Dirichlet process mixture models [1]. While Gibbs sampling and variational methods focus on local moves, the new algorithm makes more global moves. This is done by introducing a permutation of the data points as an auxiliary variable. The algorithm is a blocked sampler which alternates between sampling the clustering and sampling the permutation. The key to the efficiency of this approach is that it is possible to use dynamic programming to consider all exponentially many clusterings consistent with a given permutation. We also show that random projections can be used to effectively sample the permutation. The result is a stochastic hill-climbing algorithm that yields burn-in times significantly smaller than those of collapsed Gibbs sampling.

[1]
P. Liang, M. I. Jordan, and B. Taskar, "A Permutation-Augmented Sampler for Dirichlet Process Mixture Models," Proceedings of International Conference on Machine Learning (ICML), 2007.