We show that Bayesian inference can be *statistically inconsistent* when the model is wrong . More precisely, we present a family ('model') M of probability distributions, a distribution P outside M and a Bayesian prior distribution on M, such that
- M contains a distribution Q within a small distance \delta from P. Nevertheless:
- when data are sampled i.i.d. according to P, then, no matter how many data are observed, the Bayesian posterior puts nearly all its mass on distributions that are at a distance from P that is much larger than \delta.
- The classifier based on the Bayesian posterior can perform substantially worse than random guessing, no matter how many data are observed, even though the classifier based on Q performs much better than random guessing.
- The result holds for a variety of distance functions, including the KL (relative entropy) divergence.
- M may be chosen to contain only a countable number of distributions. This seems to make the result fundamentally different from earlier Bayesian inconsistency results by Diaconis, Freedman and Barron.
This work is a follow-up on joint work with John Langford, Yahoo Research, New York, appearing in Machine Learning 66(2-3), 2007.