Word Sense Disambiguation and Hierarchical Faceted Metadata
Hierarchical faceted metadata are browsing interfaces that narrow a search through a set of items using multiple different properties of those items. e.g. price, size, and heel height, for shoes. For large collections of items, it is expensive and entirely too time-consuming to manually place every single item into a place within the browsing structure. Castanet [Stoica and Hearst, 2007] is an algorithm for automatically creating such structures from metadata associated with each item in the collection.
I worked on integrating WSD into Castanet, in the process, I implement and learned firsthand the behaviors of graph-based, and LDA-based disambiguation strategies, as well as the frustrations of actually using WordNet for anything.
I also extended Castanet to operate on unstructured metadata - such as the tags on images in the flickr commons collection of public images. They can be browsed using the flamenco faceted browsing interface here.
Attempted Improvements to Graph-based WSD
Graph based word sense disambiguation (WSD) is a class of approaches to choosing the correct sense of an ambiguous word for a given context. Such methods are based on constructing a graph which is a representation of the various senses of the words in a context, with the different senses of words as nodes, and with edges between nodes weighted by some measure of semantic similarity. Typically, some measure of graph centrality is applied to the graph to identify the most dominant sense for each word in the string. These methods perform at the state of the art.
Over the course of a year, I investigated various graph-based WSD algorithms, and devised modifications to both the algorithms and the similarity metrics, as described in this project report. No improvements over existing performance were achieved.