WordSeer: A text analysis environment for humanities scholars
More and more source text gets digitized in the humanities every day. Scholars who want to study these new collections in depth need computational assistance because of their large scale. To help, we built WordSeer, a text analysis tool that includes visualizations and works on the grammatical structure of text.
Our user studies with humanities scholars are showing that WordSeer makes it easier to translate their questions into queries and find answers to their questions compared to standard tools.
Investigating the New York Times Linked Open Data Set
The New York Times linked open data set is an index of people, places, organizations and topics, along with the articles in which they appeared, since 1981. An API is available for querying the dataset for people, place, organization, and topic, along with keywords and co-occurrence information.
The data set is so thoroughly annotated that a lot of interesting questions, that would be difficult on other data sets because of the intermediate problems of named entity extraction, coreference labeling, and temporal order identification, can be asked. We are looking at applying some graph-based, network-based, and natural language processing techniques to this data set to learn relationships and trends.
Here is a visual explorer I made, still very much in alpha.