Electrical Engineering
      and Computer Sciences

Electrical Engineering and Computer Sciences

COLLEGE OF ENGINEERING

UC Berkeley

A Unified Theory of Inference for Text Understanding

Peter Norvig

EECS Department
University of California, Berkeley
Technical Report No. UCB/CSD-87-339
January 1987

http://www.eecs.berkeley.edu/Pubs/TechRpts/1987/CSD-87-339.pdf

Natural languages, such as English, are difficult to understand not only because of the variety of forms that can be expressed, but also because of what is not explicitly expressed. The problem of deciding what was implied by a text, or "reading between the lines" is the problem of inference. For a reader to extract the proper set of inferences from a text (the set that was intended by the text's author) requires a great deal of general knowledge on the part of the reader, as well as a capability to reason with this knowledge. When the "reader" is a computer program, it becomes very difficult to represent this knowledge so that it will be accessible when needed.

Past approaches to the problem of inference have often concentrated on a particular type of knowledge structure (such as a script) and postlulated an algorithm tuned to process just that type of structure. The problem with this approach is that it is difficult to modify the algorithm when it comes time to add a new type of knowledge structure.

An alternative, unified approach is proposed. This approach is formalized in a computer program named FAUSTUS. The algorithm recognizes six very general classes of inference, classes that are not dependent on individual knowledge structures. Rather, the classes describe general kinds of connections between concepts. New kinds of knowledge can be added without modifying the algorithm. Thus, the complexity has been shifted from the algorithm to the knowledge base. To accommodate this, a powerful knowledge representation language named KODIAK is employed.

The resulting system is capable of drawing proper inferences (and avoiding improper ones) from a variety of texts, in some cases duplicating the efforts of other systems, and in other cases improving upon them. In each case, the same unified algorithm is used, without tuning the program specifically for the text at hand.

Advisor: Robert Wilensky


BibTeX citation:

@phdthesis{Norvig:CSD-87-339,
    Author = {Norvig, Peter},
    Title = {A Unified Theory of Inference for Text Understanding},
    School = {EECS Department, University of California, Berkeley},
    Year = {1987},
    Month = {Jan},
    URL = {http://www.eecs.berkeley.edu/Pubs/TechRpts/1987/5995.html},
    Number = {UCB/CSD-87-339},
    Abstract = {Natural languages, such as English, are difficult to understand not only because of the variety of forms that can be expressed, but also because of what is not explicitly expressed. The problem of deciding what was implied by a text, or "reading between the lines" is the problem of inference. For a reader to extract the proper set of inferences from a text (the set that was intended by the text's author) requires a great deal of general knowledge on the part of the reader, as well as a capability to reason with this knowledge. When the "reader" is a computer program, it becomes very difficult to represent this knowledge so that it will be accessible when needed.  <p>  Past approaches to the problem of inference have often concentrated on a particular type of knowledge structure (such as a script) and postlulated an algorithm tuned to process just that type of structure. The problem with this approach is that it is difficult to modify the algorithm when it comes time to add a new type of knowledge structure.  <p>  An alternative, unified approach is proposed. This approach is formalized in a computer program named FAUSTUS. The algorithm recognizes six very general classes of inference, classes that are not dependent on individual knowledge structures. Rather, the classes describe general kinds of connections between concepts. New kinds of knowledge can be added without modifying the algorithm. Thus, the complexity has been shifted from the algorithm to the knowledge base. To accommodate this, a powerful knowledge representation language named KODIAK is employed.  <p>  The resulting system is capable of drawing proper inferences (and avoiding improper ones) from a variety of texts, in some cases duplicating the efforts of other systems, and in other cases improving upon them. In each case, the same unified algorithm is used, without tuning the program specifically for the text at hand.}
}

EndNote citation:

%0 Thesis
%A Norvig, Peter
%T A Unified Theory of Inference for Text Understanding
%I EECS Department, University of California, Berkeley
%D 1987
%@ UCB/CSD-87-339
%U http://www.eecs.berkeley.edu/Pubs/TechRpts/1987/5995.html
%F Norvig:CSD-87-339