Electrical Engineering
      and Computer Sciences

Electrical Engineering and Computer Sciences

COLLEGE OF ENGINEERING

UC Berkeley

Contextualizing Retrieval of Full-Length Documents

Marti A. Hearst

EECS Department
University of California, Berkeley
Technical Report No. UCB/CSD-94-789
January 1994

http://www.eecs.berkeley.edu/Pubs/TechRpts/1994/CSD-94-789.pdf

We address some issues relating to retrieval from unfamiliar text collections consisting of full-length documents. We claim that displaying query results in terms of inter-document similarity is inappropriate with long texts, and suggest instead that the results of simple initial queries should be contextualized according to category sets that correspond to the main topics of the texts. We argue that main topics of long texts should be represented by multiple categories, since in most cases one category cannot adequately classify a text. We describe a new automatic categorization algorithm that does not require pre-labeled texts and a prototype browsing interface that presents a simple mechanism for displaying multi-dimensional information.


BibTeX citation:

@techreport{Hearst:CSD-94-789,
    Author = {Hearst, Marti A.},
    Title = {Contextualizing Retrieval of Full-Length Documents},
    Institution = {EECS Department, University of California, Berkeley},
    Year = {1994},
    Month = {Jan},
    URL = {http://www.eecs.berkeley.edu/Pubs/TechRpts/1994/5388.html},
    Number = {UCB/CSD-94-789},
    Abstract = {We address some issues relating to retrieval from unfamiliar text collections consisting of full-length documents. We claim that displaying query results in terms of inter-document similarity is inappropriate with long texts, and suggest instead that the results of simple initial queries should be contextualized according to category sets that correspond to the main topics of the texts. We argue that main topics of long texts should be represented by multiple categories, since in most cases one category cannot adequately classify a text. We describe a new automatic categorization algorithm that does not require pre-labeled texts and a prototype browsing interface that presents a simple mechanism for displaying multi-dimensional information.}
}

EndNote citation:

%0 Report
%A Hearst, Marti A.
%T Contextualizing Retrieval of Full-Length Documents
%I EECS Department, University of California, Berkeley
%D 1994
%@ UCB/CSD-94-789
%U http://www.eecs.berkeley.edu/Pubs/TechRpts/1994/5388.html
%F Hearst:CSD-94-789