We consider the problem of modeling annotated data--data with multiple types where the instance of one type (such as a caption) serves as a description of the other type (such as an image). We describe three hierarchical mixture models that are aimed at such data, culminating in the Corr-LDA model, a latent variable model that is effective at both joint clustering and automatic annotation. We conduct experiments to test these models using the Corel database of images and captions.