A fundamental problem in computer vision is to simultaneously segment and recognize images. We can cast this problem as image labeling, where the aim is find coherent regions in image, and assign a class label to each region. I will describe the most successful approaches to this problem, which involve inducing a set of latent features that encode contextual relations in the images. We extend the state-of-the-art in this area by developing a framework that can learn a novel context representation by capturing joint patterns in image and labels. Our approach can benefit from a variety of conditions across the labeled images in a training set, where the labels may come from different levels of specificity, may be noisy, or missing entirely.