## It is a Fundamental Limitation to Base Probability Theory on Bivalent Logic

(Professor Lotfi A. Zadeh)

Probability theory has long played—and is continuing to play—the role of the principal tool for dealing with problems in which uncertainty is a significant factor. The history of probability theory is a history of important advances in our understanding of decision-making in uncertain environments. But what we see alongside the brilliant successes are problem areas in which progress has been elusive. An example is the class of problems in which probabilities, utilities and relations are ill-defined in ways that put such problems well beyond the reach of existing methods.

A thesis that is put forth in this paper is that standard probability theory, call it PT, has fundamental limitations—limitations which are rooted in the fact that PT is based on bivalent logic. It is this thesis that underlies the radically-sounding title of the paper: It is a Fundamental Limitation to Base Probability Theory on Bivalent Logic.

The principal rationale for this thesis is that the conceptual structure of PT is directed at addressing the partiality of certainty but not the partiality of possibility and, most importantly, the partiality of truth. A direct consequence of these omissions is that PT lacks the capability to deal with information which is not just partially certain but also partially possible and/or partially true. An example is: Most analysts believe that it is very unlikely that there will be a significant increase in the price of oil in the near future.

The principal negative consequences of basing PT on bivalent logic are the following.

(1) Brittleness (discontinuity.) Almost all concepts in PT are bivalent in the sense that a concept, C, is either true or false, with no partiality of truth allowed. For example, events A and B are either independent or not independent. A process, P, is either stationary or nonstationary, and so on. An example of brittleness is: If all A’s are B’s and all B’s are C’s, then all A’s are C’s; but if almost all A’s are B’s and almost all B’s are C’s, then all that can be said is that the proportion of A’s in C’s is between 0 and 1. In particular, brittleness of independence creates serious problems in construction of Bayesian nets since only an epsilon separates independence from dependence.

(2) The dilemma of “it is possible but not probable.” A simple version of this dilemma is the following. Assume that A is a proper subset of B and that the Lebesgue measure of A is arbitrarily close to the Lebesgue measure of B. Now, what can be said about the probability measure, P(A), given the probability measure P(B)? The only assertion that can be made is that P(A) lies between 0 and P(B). The uniformativeness of this assessment of P(A) leads to counterintuitive conclusions. For example, suppose that with probability 0.99 Robert returns from work within one minute of 6pm. What is the probability that he is home at 6pm? Using PT, with no additional information or the use of the maximum entropy principle, the answer is: between 0 and 1. This simple example is an instance of a basic problem of what to do when we know what is possible but cannot assess the associated probabilities or probability distributions. A case in point relates to assessment of the probability of a worst case scenario.

(3) Incapability to operate on perception-based information. This is the most serious limitation of PT. It is rooted in the fact that much of human knowledge is perception-based, e.g., “Most Swedes are tall;” “It is foggy in San Francisco during the summer,” “It is unlikely to be warm tomorrow.” and “Usually Robert returns from work at about 6pm.”

Perceptions are intrinsically imprecise, reflecting the bounded ability of sensory organs and, ultimately, the brain, to resolve detail and store information. More concretely, perceptions are f-granular in the sense that (a) the boundaries of perceived classes are unsharp; and (b) the values of perceived attributes are granulated, with a granule being a clump of values drawn together by indistinguishability, similarity, proximity or functionality.

F-granularity of perceptions puts them well beyond the reach of conventional meaning-representation methods based on bivalent logic. As a consequence, PT, by itself, lacks the capability to operate on perception-based information: As an illustration, PT cannot provide an answer to the query: What is the probability that Robert is home at about t pm, given the perception-based information that (a) usually Robert leaves his office at about 5:30 pm; and (b) usually it takes about 30 minutes to reach home.

What can be done to endow probability theory with the capability to operate on perception-based information? In a recent paper entitled “Toward a Perception-Based Theory of Probabilistic Reasoning with Imprecise Probabilities,” (Journal of Statistical Planning and Inference), I outlined an approach to a restructuring of probability theory which adds this capability to PT. Briefly, the proposed restructuring involves three stages of generalization: (a) f-generalization; (b) f.g-generalization; and (c) nl-generalization. More specifically:

(a) F-generalization involves a progression from crisp sets to fuzzy sets in PT, leading to a generalization of PT which is denoted as PT+. In PT+, probabilities, functions, relations, measures and everything else are allowed to have fuzzy denotations, that is, be a matter of degree. In particular, probabilities described as low, high, not very high, etc. are interpreted as labels of fuzzy subsets of the unit interval or, equivalently, as possibility distributions of their numerical values.

(b) F.g-generalization involves fuzzy granulation of variables, functions, relations, etc., leading to a generalization of PT which is denoted as PT++. By fuzzy granulation of a variable, X is meant a partition of the range of X into fuzzy granules, with a granule being a clump of values of X which are drawn together by indistinguishability, similarity, proximity or functionality. Membership functions of such granules are usually assumed to be triangular or trapezoidal. Basically, granularity reflects the bounded ability of the human mind to resolve detail and store information.

(c) Nl-generalization involves an addition to PT++ of a capability to operate on propositions expressed in a natural language, with the understanding that such propositions serve as descriptors of perceptions. Nl-generalization of PT leads to perception-based probability theory denoted as PTp. By construction, PTp is needed to answer the query in the Robert example: What is the probability that Robert is home at about t pm?

The principal thrust of what was said above may be viewed as a call for a recognition that standard probability theory has serious limitations—limitations which are rooted in its bivalent base and the fundamental conflict between bivalence and reality. What is needed to circumvent these limitations is a restructuring of the foundations of probability theory—a restructuring aimed principally at endowing probability theory with the capability to operate on perception-based information.

* Professor in the Graduate School and Director, Berkeley Initiative in Soft Computing (BISC), Computer Science Division and the Electronics Research Laboratory, Department of EECS, University of California, Berkeley, CA 94720-1776; Telephone: 510-642-4959; Fax: 510-642-1712; E-mail: zadeh@cs.berkeley.edu. Research supported in part by ONR Contract N00014-00-1-0621, ONR Contract N00014-99-C-0298, NASA Contract NCC2-1006, NASA Grant NAC2-117, ONR Grant N00014-96-1-0556, ONR Grant FDN0014991035, ARO Grant DAAH 04-961-0341 and the BISC Program of UC Berkeley.