# Bayesian Hierarchical Clustering

### Katherine Heller

University of Cambridge

### Abstract

We present a novel algorithm for agglomerative hierarchical clustering
based on evaluating marginal likelihoods of a probabilistic model.
This algorithm has several advantages over traditional distance-based
agglomerative clustering algorithms. (1) It defines a probabilistic
model of the data which can be used to compute the predictive
distribution of a test point and the probability of it belonging to
any of the existing clusters in the tree. (2) It uses a model-based
criterion to decide on merging clusters rather than an ad-hoc distance
metric. (3) Bayesian hypothesis testing is used to decide which merges
are advantageous and to output the recommended depth of the tree. (4)
The algorithm can be interpreted as a novel fast bottom up approximate
inference method for a Dirichlet process (i.e. countably infinite)
mixture model (DPM). It approximates the marginal likelihood of a DPM
by summing over exponentially many clusterings of the data in
polynomial time. We describe procedures for learning the model
hyperparameters, computing the predictive distribution, and extensions
to the algorithm. Experimental results on synthetic and real-world
data sets demonstrate useful properties of the algorithm, including
recent results on evaluating the lower bound on the marginal
likelihood of a DPM.