This web site is under construction, September 2010.


The project is about the statistical analysis of text data coming from news media and other public data, such as voting records. Our main goal is to uncover statistical associations between words or terms in large text corpora, and visualize the dynamics of these associations. The emphasis is on interpretable visualizations via sparse statistical methods, such as sparse logistic regression. The project started informally in late 2007.

Our SnapDragon tool allows queries to be made on a variety of archival and current news data.


The project is currently funded by NSF (grant NSF-CDI # 0835531), and a research grant from Google.

Selected recent publications and talks

Some examples