StatNews Project: Statistical Analysis of News Media

This web site is under construction, September 2010.

Overview

The project is about the statistical analysis of text data coming from news media and other public data, such as voting records. Our main goal is to uncover statistical associations between words or terms in large text corpora, and visualize the dynamics of these associations. The emphasis is on interpretable visualizations via sparse statistical methods, such as sparse logistic regression. The project started informally in late 2007.

Our SnapDragon tool allows queries to be made on a variety of archival and current news data.

Funding

The project is currently funded by NSF (grant NSF-CDI # 0835531), and a research grant from Google.

Selected recent publications and talks

B. Gawalt, J. Jia, L. Miratrix, L. El Ghaoui, B. Yu, S. Clavier. Discovering word associations in news media via feature selection and sparse classification.

Statistical Analysis of Online News, talk given at the Information Systems Laboratory Colloquium, Stanford, November 2007. A shorter version was presented at the Berkeley Center for New Media, October 2008.

StatNews Project: Statistical Analysis of News Media

Overview

Funding

Selected recent publications and talks

Some examples