A Prototype-Centered Approach to Adding Deduction Capability to Search Engines--The Concept of Protoform

(Professor Lotfi A. Zadeh)

Existing search engines have many remarkable capabilities. But what is not among them is deduction capability--the capability to answer a query by drawing on information which resides in various parts of the knowledge base or is augmented by the user. An example of a simple query which cannot be answered by any existing search engine is: "How many UC Berkeley alumni were born in California?" In this case, the query and the query-relevant information are crisp. In the query "How many UC Berkeley alumni are wealthy?" the query-relevant information is crisp but the query is fuzzy. The problem--which is not widely recognized--is that much of the information in the knowledge base of a search engine is perception-based. Methods based on bivalent logic and standard probability theory lack capability to operate on perception-based information. A search engine with deduction capability is, in effect, a question-answering system.

Limited progress toward a realization of deduction capability is achievable through application of methods based on bivalent logic and standard probability theory. But to move beyond the reach of standard methods it is necessary to change direction. In the approach which is outlined, a concept which plays a pivotal role is that of a prototype--a concept which has a position of centrality in human reasoning, recognition, search and decision processes.

Informally, a prototype may be defined as a sigma-summary, that is, a summary of summaries. With this definition as the point of departure, a prototypical form, or protoform, for short, is defined as an abstracted prototype. As a simple example, the protoform of the proposition "Most Swedes are tall" is "QA's are B's," where Q is a fuzzy quantifier, and A and B are labels of fuzzy sets. Similarly, the protoform of "Usually Robert returns from work at about 6 pm," is "Prob (A) is B," where A is a fuzzy event and B is a fuzzy probability.

Abstraction has levels, just as summarization does. For example, in the case of "Most Swedes are tall," successive abstracted forms are "Most A's are tall," "Most A's are B's," and "QA's are B's."

At a specified level of abstraction, propositions are PF-equivalent if they have identical protoforms. For example, propositions "Usually Robert returns from work at about 6 pm" and "In winter, the average daily temperature in Berkeley is usually about fifteen degrees centigrade," are PF-equivalent. The importance of the concepts of protoform and PF-equivalence derives in large measure from the fact that they serve as a basis for knowledge compression.

A knowledge base is assumed to consist of a factual database, FDB, and a deduction database, DDB. In both FDB and DDB, knowledge is assumed to fall into two categories: (a) crisp and (b) fuzzy. Examples of crisp items of knowledge in FDB might be: “Height of the Eiffel tower is 324m” and “Paris is the capital of France.” Examples of fuzzy items might be “Most Swedes are tall,” and “California has a temperate climate.” Similarly, in DDB, an example of a crisp rule might be “If A and B are crisp convex sets, then their intersection is a crisp convex set.” An example of a fuzzy rule might be “If A and B are fuzzy convex sets, then their intersection is a fuzzy convex set.” A fuzzy rule may be a crisp assertion about fuzzy sets or a fuzzy assertion about crisp sets or a fuzzy assertion about fuzzy sets.

The deduction database is assumed to consist of a logical database and a computational database, with the rules of deduction having the structure of protoforms. An example of a computational rule is "If Q A's are B's and Q (A and B)'s are C's," then "Q Q A's are (B and C)'s,” where Q and Q are fuzzy quantifiers, and A, B and C are labels of fuzzy sets. The number of rules in the computational database is assumed to be very large in order to allow a chaining of rules that may be query-relevant.

A very simple example of deduction in the prototype-centered approach—an example which involves string matching but no chaining--is the following. Suppose that a query is “How many Swedes are very tall?” A protoform of this query is: ?Q A’s are 2B, where Q is a fuzzy quantifier and 2B is assumed to represent the meaning of “very B,” with the membership function of 2B being the square of the membership function of B. Searching DDB, we find the rule “If Q A’s are B then Q1/2 A’s are 2B,” whose consequent matches the query, with ?Q instantiated to Q1/2, A to “Swedes,” and B to “tall.” Furthermore, in FDB, we find the fact “Most Swedes are tall,” which matches the antecedent of the rule, with Q instantiated to “Most.” A to “Swedes” and B to “tall.” Consequently, the answer to the query is “Most1/2 Swedes are very tall,” where the membership function of “Most1/2” is the square root of Most in fuzzy arithmetic.

Another example of deduction is a simple version of what is known as the Robert example. Specifically, from the perception, p. “Usually Robert returns from work at about 6:00pm,” what is to be deduced is the answer to the question, q, “What is the probability that Robert is home at about t pm,” where t is a specified time.

In this case, the protoform of p is: Prob(A) is B, where A is an abstraction of the fuzzy event “Robert returns from work at about 6:00pm,” and B is an abstraction of the fuzzy probability “usually.” Similarly, the protoform of q is: Prob(C) is ?D, where C is an abstraction of the fuzzy event “Robert returns from work before about t pm,” and D is an abstraction of the probability which is the object of deduction.

An applicable deduction rule in DDB is

Prob(A) is B
Prob(C) is D

in which D is the solution of a variational problem involving the membership function of A, B, C and D. The details of solution may be found in a forthcoming paper “Toward a Perception-Based Theory of Probabilistic Reasoning with Imprecise Probabilities,” to appear in the Journal of Statistical Planning and Inference.

The rules of deduction in the prototype-centered approach involve generalized constraint propagation, with generalized constraints represented as protoforms. Deduction is carried out through backtracking from query to query-relevant information. The concept of protoform has some links to the concept of ontology in ontological engineering, but unlike the latter is rooted in fuzzy logic and perception-based probability theory.

What should be underscored is that the problem of adding deduction capability to search engines is many-faceted and complex. It would be unrealistic to expect rapid progress toward its solution.

* Professor in the Graduate School and director, Berkeley initiative in Soft Computing (BISC), Computer Science Division and the Electronics Research Laboratory, Department of EECS, University of California, Berkeley, CA 94720-1776; Telephone: 510-642-4959; Fax: 510-642-1712; E-Mail: zadeh@cs.berkeley.edu . Research supported in part by ONR N00014-00-1-0621, ONR Contract N00014-99-C-0298, NASA Contract NCC2-1006, NASA Grant NAC2-117, ONR Grant N00014-96-1-0556, ONR Grant FDN0014991035, ARO Grant DAAH 04-961-0341 and the BISC Program of UC Berkeley.

More information (http://www.cs.berkeley.edu/People/Faculty/Homepages/zadeh.html) or

Send mail to the author : (zadeh@cs.berkeley.edu)

Edit this abstract