Kernel-based learning algorithms work by embedding the data into a Euclidean space, and then searching for linear relations among the embedded data points. The embedding is performed implicitly, by specifying the inner products between each pair of points in the embedding space. This information is contained in the so-called kernel matrix, a symmetric and positive definite matrix that encodes the relative positions of all points. Specifying this matrix amounts to specifying the geometry of the embedding space and inducing a notion of similarity in the input space--classical model selection problems in machine learning. In this project we show how the kernel matrix can be learned from data via semi-definite programming (SDP) techniques. When applied to a kernel matrix associated with both training and test data this gives a powerful transductive algorithm. Using the labeled part of the data one can learn an embedding also for the unlabeled part. The similarity between test points is inferred from training points and their labels. Importantly, these learning problems are convex, so we obtain a method for learning both the model class and the function without local minima. Finally, the novel approach presented in the project is supported by positive empirical results.
1Professor, Dept. of Statistics, UC Davis
2Visiting Professor, A.N.U., Canberra
In this project we consider the problem of novelty detection, presenting an algorithm that aims to find a minimal region in input space containing a fraction alpha of the probability mass underlying a data set. This algorithm, the "single-class minimax probability machine (MPM)," is built on a distribution-free methodology that minimizes the worst-case probability of a data point falling outside of a convex set, given only the mean and covariance matrix of the distribution and making no further distributional assumptions. We present a robust approach to estimating the mean and covariance matrix within the general two-class MPM setting, and show how this approach specializes to the single-class problem. We provide empirical results comparing the single-class MPM to the single-class SVM and a two-class SVM method.
Optimal solutions to the Markov decision problems (MDPs) are often sensitive with respect to the state transition probabilities. In many practical problems, the estimation of the transition probabilities is far from accurate. Hence, estimation errors are, together with the curse of dimensionality, a limiting factor in applying MDPs to real-world problems. We propose an algorithm for solving finite-state and finite-action MDPs, where the solution is guaranteed to be robust with respect to the estimation errors of the state transition probabilities. Our algorithm involves a statistically accurate yet numerically efficient representation of uncertainty via likelihood or entropy functions. The worst-case complexity of the robust algorithm is within a factor log n (n is the number of states) of the original Bellmann recursion. Hence, robustness can be added at almost no extra computing cost. We applied our algorithm to the problem of the dynamic routing of an aircraft under stochastic obstacles, where the probabilties of the obstacles are unknown but bounded. We showed that a siginificant reduction in delay can be obtained by using our robust strategies with respect to the nominal strategy when the data is uncertain.
We consider the problem of modeling annotated data--data with multiple types where the instance of one type (such as a caption) serves as a description of the other type (such as an image). We describe three hierarchical mixture models that are aimed at such data, culminating in the Corr-LDA model, a latent variable model that is effective at both joint clustering and automatic annotation. We conduct experiments to test these models using the Corel database of images and captions.
Independent component analysis is a recent statistical method for revealing hidden factors of sets of random variables, where the hidden factors (the components) are assumed to be statistically independent. It has been successfully applied to problems such as audio blind source separation--the so-called "cocktail party" problem, where streams of overlapping speech have to be separated according to the individual speakers--and biomedical data processing.
Kernel methods, as exemplified by support vector machines, are a novel paradigm for pattern recognition. They efficiently enable one to extend well-known and well-studied linear techniques to become nonlinear techniques.
The object of our work is to use kernel methods to perform ICA. We show how, by applying the classical multivariate statistical technique of canonical correlations to a kernel space, we obtain a family of nonlinear ICA algorithms. On synthetic examples, our "Kernel-ICA" algorithms outperform many of the known algorithms for ICA.
We present a class of algorithms for learning the structure of graphical models from data. The algorithms are based on a measure known as the kernel generalized variance (KGV), which essentially allows us to treat all variables on equal footing as Gaussians in a feature space obtained from Mercer kernels. Thus we are able to learn hybrid graphs involving discrete and continuous variables of arbitrary type. We explore the computational properties of our approach, showing how to use the kernel trick to compute the relevant statistics in linear time. We illustrate our framework with experiments involving discrete and continuous data.
We propose a generalization of independent component analysis (ICA), where instead of looking for a linear transform that makes the data components independent, we look for a transform that makes the data components well fit by a tree-structured graphical model. Treating the problem as a semiparametric statistical problem, we show that the optimal transform is found by minimizing a contrast function based on mutual information, a function that directly extends the contrast function used for classical ICA. We provide two approximations of this contrast function, one using kernel density estimation, and another using kernel generalized variance.
This tree-dependent component analysis framework leads naturally to an efficient general multivariate density estimation technique where only bivariate density estimation needs to be performed.
In recent years, high throughput biological techniques have been developed to collect data on the genomic response of an organism to stress. We employ DNA microarray technology to gain a survival snapshot of haploinsufficient Saccharomyces cerevisiae (bakers' yeast) in the presence of several antifungal drugs, anticancer drugs, and environmental stresses. Using well-known statistical and machine learning techniques, we identify protein targets and side effects of these stresses with a measure of confidence. This information is further used to hypothesize previously unknown protein functions, molecular and genetic pathways, and drug interactions.
We develop three-dimensional shape contexts as part of an approach to 3D object recognition from point clouds. 3D shape contexts are semi-local descriptions of object shape centered at points on an object's surface, and are a natural extension of 2D shape contexts introduced by Belongie, Malik, and Puzicha [1] for recognition in 2D images. 3D shape contexts are joint histograms of point density parameterized by radius, azimuth, and elevation. These features are similar in spirit to spin images [2], which have shown good performance in 3D object recognition tasks. Spin images are two-dimensional descriptors, summing over the azimuth angle, whereas shape contexts preserve information in all three dimensions.
To recognize objects, we compute shape contexts at a few randomly chosen points on a query image, and find their nearest neighbors in a stored set of shape contexts for a set of sample points on 3D object models. The model with the smallest combined distances is taken to be the best match. Finding nearest neighbors in high dimensions is computationally expensive, so we explore the use of clustering and locality sensitive hashing [3] to speed up the search while maintaining accuracy. Results are shown for both full 3D models and simulated range data.
Figure 1: Visualization of the histogram bins of the 3D
shape context
In this work we study the problem of combining region and boundary cues for natural image segmentation. We employ a large database of manually segmented images in order to learn an optimal affinity function between pairs of pixels. These pairwise affinities can then be used to cluster the pixels into visually coherent groups. Region cues are computed as the similarity in brightness, color, and texture between image patches. Boundary cues are incorporated by looking for the presence of an "intervening contour," a large gradient along a straight line connecting two pixels.
We first use the dataset of human segmentations to individually optimize parameters of the patch and gradient features for brightness, color, and texture cues. We then quantitatively measure the power of different feature combinations by computing the precision and recall of classifiers trained using those features. The mutual information between the output of the classifiers and the same-segment indicator function provides an alternative evaluation technique that yields identical conclusions.
As expected, the best classifier makes use of brightness, color, and texture features, in both patch and gradient forms. We find that for brightness, the gradient cue outperforms the patch similarity. In contrast, using color patch similarity yields better results than using color gradients. Texture is the most powerful of the three channels, with both patches and gradients carrying significant independent information. Interestingly, the proximity of the two pixels does not add any information beyond that provided by the similarity cues. We also find that the convexity assumptions made by the intervening contour approach are supported by the ecological statistics of the dataset.

Figure 1: Pixel affinity images. The first row shows an image
with one pixel selected. The remaining rows show the similarity
between that pixel and all other pixels in the image, where white is
most similar. Rows 2-4 show our patch-only, contour-only, and
patch+contour affinity models. Rows 5 and 6 show the pixel
similarity as given by the groundtruth data, where white corresponds
to more agreement between humans. Row 6 shows simply the same-segment
indicator function, while row 5 is computed using intervening contour
on the human boundary maps.

Figure 2: Performance of humans compared to our best pixel affinity
models. The dots show the precision and recall of each of 1366 human
segmentations in the 250-image test set when compared to the other
humans' segmentation of the same image. The large dot marks the
median recall (99%) and precision (63%) of the humans. The
iso-F-measure curve at F=77% is extended from this point to represent
the frontier of human performance for this task. The three remaining
curves represent our patch-only model, contour-only model, and
patch+contour model. Neither patches nor contours are sufficient, as
there is significant independent information in the patch and contour
cues. The model used throughout the paper is a logistic function with
quadratic terms which performs the best among classifiers tried on this dataset.
Spectral graph theoretic methods have recently shown great promise for the problem of image segmentation. However, do to the computational demands of such methods, applications to large problems such as spatiotemporal data and high resolution imagery have been slow to appear. The contribution of this work is a method that substantially reduces the computational requirements of grouping algorithms based on spectral partitioning, making it feasible to apply them to very large grouping problems. Our approach is based on a technique for the numerical solution of eigenfunction problems known as the Nyström method. This method allows extrapolation of the complete grouping solution using only a small number of "typical" samples. In doing so, we successfully exploit the fact that there are far fewer coherent groups in a scene than pixels.
The goal of this work is to accurately detect and localize boundaries in natural scenes using local image measurements. We formulate features that respond to characteristic changes in brightness, color, and texture associated with natural boundaries. In order to combine the information from these features in an optimal way, we train a classifier using human labeled images as ground truth. The output of this classifier provides the posterior probability of a boundary at each image location and orientation. We present precision-recall curves showing that the resulting detector significantly outperforms existing approaches. Our two main results are (1) that cue combination can be performed adequately with a simple linear model, and (2) that a proper treatment of texture is required to detect boundaries in natural images.

Figure 1: Two decades of boundary detection. The performance of our
boundary detector compared to classical boundary detection methods and
to the human subjects' performance. A precision-recall curve is shown
for five boundary detectors: (1) Gaussian derivative (GD); (2)
Gaussian derivative with hysteresis thresholding (GD+H), the Canny
detector; (3) A detector based on the second moment matrix (2MM); (4)
our grayscale detector that combines brightness and texture (BG+TG);
and (5) our color detector that combines brightness, color, and
texture (BG+CG+TG). Each detector is represented by its
precision-recall curve, which measures the tradeoff between
accuracy and noise as the detector's threshold varies. Shown in the
caption is each curve's F-measure, valued from zero to one. The
F-measure is a summary statistic for a precision-recall curve. The
points on the plot show the precision and recall of each ground truth
human segmentation when compared to the other humans. The median F
measure for the human subjects is 0.80. The solid curve shows the
F=0.80 curve, representing the frontier of human performance for this
task.

Figure 2: Local image features. In each row, the first panel shows an
image patch. The following panels show feature profiles along the
patch's horizontal diameter. The features are raw image intensity, brightness gradient BG,
color gradient CG, raw texture gradient TG, and
localized texture gradient TG. The vertical red line in
each profile marks the patch center. The scale of each feature has
been chosen to maximize performance on the set of training images--2%
of the image diagonal (5.7 pixels) for CG and TG, and 1% of the
image diagonal (3 pixels) for BG. The challenge is to combine these
features in order to detect and localize boundaries.

Figure 3: Boundary images for three grayscale detectors. Columns 2-4 show
P_b images for the Canny Detector, the second moment matrix (2MM), and our
brightness+texture detector (BG+TG). The human segmentations are
shown for comparison. The BG+TG detector benefits from operating at a
large scale without sacrificing localization and the suppression of
edges on the interior of textured regions.
The problem we consider in this project is to take a single two-dimensional image containing a human body, locate the joint positions, and use these to estimate the body configuration and pose in three-dimensional space. The basic approach is to store a number of exemplar 2D views of the human body in a variety of different configurations and viewpoints with respect to the camera. On each of these stored views, the locations of the body joints (left elbow, right knee, etc.) are manually marked and labelled for future use. The test shape is then matched to each stored view, using the technique of shape context matching. Assuming that there is a stored view sufficiently similar in configuration and pose, the correspondence process will succeed. The locations of the body joints are then transferred from the exemplar view to the test shape. Given the joint locations, the 3D body configuration and pose are then estimated. We present results of our method on a corpus of human pose data.
Automatic speech recognition (ASR) provides a natural interface to small form-factor computers (such as PDAs) since keyboards and large displays are absent on these platforms. However, large vocabulary, robust ASR requires hardware resources far beyond those available on current PDAs. Emerging architectures, such as Vector IRAM at UC Berkeley, and Imagine at Stanford, provide a partial solution by delivering very high performance for relatively little expenditure of power. However, for speech recognition to take advantage of these architectures, the components of the system must be redesigned with the new systems in mind.
We are currently adapting the workstation-based ASR system used at ICSI to run efficiently on these architectures. Two out of the three major components of ICSI's speech system, the acoustic front-end and the phoneme probability estimator, contain computational kernels that are very regular (FFT and matrix-matrix multiply, respectively). These components run extremely efficiently on both architectures. The third component, the decoder, consists of a highly pruned (and therefore irregular) search through all possible utterances. Thus, the primary focus of our current effort is on this portion of the speech system.
Our initial implementation consists of a small vocabulary system. With a small vocabulary, it is not necessary to share state among similar words; rather, one can evaluate all the words separately. This allows an efficient, regular implementation. On IRAM, we arrange batches of words with total length equal to the vector length. On Imagine, we batch words such that the total length will fit in the cluster memory. We are in the process of analyzing the results of this approach.
Future work includes running a large vocabulary system on these architectures. This involves picking a search order that will maximize reuse of state from previous searches (e.g., if the word "architecture" has already been processed, most of the work can be reused for the word "architectural"). Language modeling, beam pruning, and least-upper-bound path calculations may also be accelerated on these architectures.
This work further explores the multi-band approach to automatic speech recognition (ASR) using probabilistic graphical models (PGM) to classify a set of intermediate speech attributes. The term "multi-band approach" refers to an approach that independently processes non-overlapping frequency channels, or bands, in speech, in contrast to the conventional "full-band approach" which looks at the entire bandwidth of speech as a basis for recognition. Typically, these multi-band systems involve processing speech independently on multiple frequency channels, or sub-bands, training classifiers on sub-band features to learn phone probabilities, and using these probabilities to form the best word hypothesis.
Unlike previous multi-band automatic speech recognition (ASR) systems, this work uses multi-band graphical models to classify a set of intermediate attributes of speech instead of phonemes directly. Intermediate attributes of speech can be linguistically motivated or derived automatically from the data. Both of these types of attributes are explored in this work with more emphasis on automatically-derived attributes.
All of the proposed elements of this system are aimed at the goal of improving the robustness of a speech recognizer to unseen noise conditions, i.e., environmental conditions for which the recognizer was not trained. Because humans exhibit a great amount of tolerance to noise compared to state of the art speech recognizers, it may be helpful to imitate certain characteristics of human hearing to improve the robustness of ASR systems. Multi-band processing takes its inspiration from the way humans exploit redundant cues found in multiple frequency regions to maintain robust recognition. Classification of intermediate speech attributes is motivated by studies that show how people can discern certain speech attributes, like voicing and nasality, despite confounding noise. We hypothesize that our proposed multi-band approach to speech recognition will significantly improve the performance of the speech recognizer to noisy speech.
A natural approach to developing control architectures for complex tasks is to design multiple simple components, each addressing one aspect of the task domain, and provide a framework for combining the results. When an agent must learn these components by interacting with the environment, as in a Markov decision problem, the designer must take care not to sacrifice an optimal solution for a convenient representation. We propose an architecture that extends the well-known Q learning framework. It permits each component of a controller to compute a value function corresponding to a particular reward signal in the environment, with a supervisor executing the policy that optimizes the average value over all components. This allows each component to learn an estimate of its true value function in advance and improve it while the supervisor executes the policy implied by its agents, and in certain circumstances leads to an optimal policy. Our research applies this approach to complex motor learning problems, which require agents to balance a variety of constraints. We also plan to adapt this method to current techniques that allow researchers to specify partial programs solving Markov decision problems.
There is growing interest in building large knowledge bases (KBs) of everyday knowledge about the world, comprising tens or hundreds of thousands of assertions. But with large KBs, general-purpose reasoners used to answer queries tend to suffer from combinatorial explosion. One promising approach to grappling with the complexity of such KBs is to structure the content into multiple domain- or task-specific partitions. But how are the partitions created, and how do we exploit them in reasoning?
Our research concerns detecting and exploiting the implicit structure of knowledge to make reasoning more focused and efficient. In particular, we investigate how two closely related questions: (1) How can we automatically decompose a large KB into a network of tightly-focused, minimally-connected partitions? (2) Given a network of (possibly heterogeneous) knowledge systems, how can we achieve efficient global reasoning?
In our approach, a first-order or propositional theory is partitioned into tightly coupled subtheories according to the language of the axioms in the theory [1]. This partitioning induces a graphical representation where a node represents a particular partition or subtheory and an arc represents the shared language between subtheories [2].
We developed a family of message-passing (MP) algorithms [1,3] for reasoning with partitions of axioms in propositional logic (PL) and first-order logic (FOL). We analyzed the computational benefit of our algorithms and detected those parameters of a partitioning that influence the efficiency of computation. These inference algorithms are sound and complete, i.e., provide all the correct answers and only those answers. The efficiency of our algorithms has been shown empirically to be superiour to widely used inference strategies.
In this work we investigate a general-purpose planning approach that finds plans in structured domains. Our algorithm factors a planning domain into subdomains and finds plans using this factorization. The runtime for our algorithm scales linearly with the size of the domain, and exponentially with the size of the largest subdomain and the amount of interaction between subdomains. We achieve this by limiting the amount of back-and-forth interactions that a plan allows between subdomains, and by choosing a factoring that minimizes this interaction. We do so using earlier work on graph partitioning and treewidth, but the user may also provide a factorization to the algorithm. We prove that our planning algorithm is sound and complete.
Many real-world planning domains have the decomposable structure that our approach exploits. We implemented and tested our algorithm in such domains and discovered that it finds plans in a fraction of the time taken by state-of-the-art planners, and scales to very large problems.
First-order probabilistic languages (FOPLs) combine the expressive power of first-order logic with the uncertainty handling of probability theory. Our group aims to apply these languages to interesting real-world domains. However, as we consider FOPLs of increasing complexity, inference grows difficult; and so, when we define useful languages, we must simultaneously work on developing tractable inference algorithms.
In recent years, several families of FOPLs have been proposed, but no clear consensus has emerged about what the "right" language is, either in general, or in specific application domains. We are investigating these proposed languages with respect to expressive power, efficiency of inference, and ease of modeling.
Much of our current research is inspired by Avi Pfeffer's probabilistic relational models (PRMs), a FOPL family based on semantic networks. We have extended PRMs in several ways, most notably by removing the unique-names assumption, and thus introducing uncertainty over the number of objects present in the system. This has permitted us to apply our work to problems such as data association (vehicle tracking), and, more recently, citation clustering. We plan to continue working on information extraction from the web, and also on problems from computational biology, such as modeling motifs in DNA sequences.
Exact inference in these domains is, in general, not tractable, and so we have developed an approximate approach based on the Markov Chain Monte Carlo algorithm, augmented by intelligent proposal distributions. We are also developing deterministic methods, in which the relational structure of the domain is used to motivate a structured variational approximation to the true posterior.
1Postdoctoral Researcher
We provide algorithms for determining the belief state of an agent, i.e., its knowledge about the state of the world. We describe our domains using a logical action language that allows nondeterministic actions and partial observability. Our algorithms update an initial belief state with every execution of an action and when collecting observations. This iterative updating process is called logical filtering. Our algorithms are computationally superior to current methods that are used in nondeterministic planning. Several classes of dynamic systems that we identify allow particularly efficient filtering. In some cases our algorithms compute the filtering of action/observation sequences of arbitrary length efficiently while maintaining compact representation of belief states over time. Other cases allow efficient approximation of the belief state under reasonable conditions. Some of the properties of domains that we identify can be used to launch further investigation into efficient projection, execution monitoring, planning, and diagnosis.
1Postdoctoral Researcher
We are exploring efficient techniques for reasoning under uncertainty in first-order domains. Given a knowledge base that contains sentences of first-order logic, each of which may be adorned with a degree of belief, i.e., a probability of truth, we would like to compute the degree of belief of a new sentence. Our technique involves transforming such a knowledge base into a probability distribution over possible worlds, which defines the degree of belief of every sentence of the language. Our current work involves constructing maximum entropy (or log-linear) distributions. We find that this approach has appealing inferential properties and a compact representation. In particular, maximum entropy distributions can be represented by graphical models whose structure corresponds to the structure of the original knowledge base, and conditional independences in such models can be leveraged to obtain efficient inference algorithms. Importantly, we have found that the complexity of probabilistic reasoning under these maximum entropy distributions is no harder than the corresponding deterministic inference problems.
Simultaneous localization and mapping is a fundamental problem in mobile robotics: while a robot navigates in an unknown environment, it must incrementally build a map of its surroundings and localize itself within that map. Traditional approaches to the problem are based upon Kalman filters, but suffer from complexity issues: first, the belief state grows quadratically in the size of the map; and second, the filtering operation can take time quadratic in the size of the map. I have developed a linear-space filter that maintains a tractable approximation of the filtered belief state as a thin junction tree. The junction tree grows under measurement and motion updates and is periodically "thinned" to remain tractable. The time complexity of the filter operation is linear in the size of the map, and further approximations permit constant-time approximate filtering.
We propose a simple analytic solution to the eigenvector segmentation problem. In the absence of noise, we derive a rank constraint, from which one can determine the number of groups n present in the scene. Once n has been determined, the segmentation of the image measurements can be obtained from the roots of a polynomial of degree n in one variable. Since the coefficients of the polynomial are computed by solving a linear system, we show that there is a unique global solution to the eigenvector segmentation problem, which can be obtained in closed form when n < 5.
This purely algebraic solution is shown to be robust in the presence of noise and can be used to initialize an optimal algorithm. We derive such an optimal objective function for the case of zero-mean Gaussian noise on the entries of the eigenvector. We then study the case of multiple eigenvectors and show that it can be reduced to a collection of polynomial segmentation problems.
We present experimental results on intensity-based and motion-based image segmentation. Our experiments show that our algorithm performs similarly or better than Kmeans and EM, and is at least eight times faster. This is because it only needs to solve one linear system in n variables plus one polynomial of degree n in one variable.

Figure 1: A comparison of Kmeans, EM, and Polysegment on intensity-based segmentation of the penguin image

Figure 2: Motion based segmentation of a sequence with two moving robots
We study the multi-frame structure of motion problems when the camera translates on a plane with small baselines and arbitrary rotations. This case shows up in many practical applications, for example, in ground robot navigation. We consider the framework for small baselines presented in [1], in which a factorization method is used to compute the structure and motion parameters accurately, efficiently, and with guaranteed convergence. When the camera translates on a plane, the algorithm in [1] cannot be applied because the estimation matrix drops rank, causing the equations to no longer be linear. In this project, we show how to linearly solve those equations while preserving the accuracy, speed, and convergence properties of the non-planar algorithm. We evaluate the proposed algorithms on synthetic and real image sequences, and compare our results with those of the optimal algorithm. The proposed algorithms are very fast, accurate, have less than 0.3% outliers, work well for small-to-medium baselines and non-planar as well as planar motions.
We consider the problem of identifying n linear subspaces in a K-dimensional linear space from a collection of sample points drawn from these subspaces.
In the absence of noise, we cast GPCA in an algebraic geometric framework in which the number of classes n becomes the degree of a certain polynomial and the classes themselves become the factors (roots) of such a polynomial. For subspaces of dimension k=K-1, we show that GPCA is equivalent to factoring homogeneous polynomials of degree n in K variables, and provide a polynomial time solution to this problem using linear algebraic techniques. For subspaces of arbitrary dimension k < K-1, we show that GPCA can be reduced to the case k = K'-1 by first projecting the data into a K'-dimensional subspace of RK.
In the presence of noise, we cast GPCA as a constrained nonlinear least squares problem which minimizes the reprojection error subject to all mixture constraints. By converting this constrained problem into an unconstrained one, we obtain an optimal function from which the subspaces can be directly recovered using standard non-linear optimization techniques. We use the algebraic solution to initialize maximum likelihood algorithms based on nonlinear optimization or iterative algorithms such as EM.
GPCA can be applied to a variety of estimation problems in which the data comes simultaneously from multiple (approximately) linear models. We apply GPCA to the multibody 3D and affine structure from motion problem in computer vision, i.e., the problem of estimating the 3D motion of multiple moving objects from 2D imagery. We also apply GPCA to image segmentation and eigenvector segmentation.

Figure 1: Generalized principal component analysis: the case of 2 lines in the XY plane
In [1,2] we present a geometric approach for the analysis of dynamic scenes containing multiple rigidly moving objects seen in two perspective views. Our approach exploits the algebraic and geometric properties of the so-called multibody epipolar constraint and its associated multibody fundamental matrix, which are natural generalizations of the epipolar constraint and of the fundamental matrix to multiple moving objects. We derive a rank constraint on the image points from which one can estimate the number of independent motions and linearly solve for the multibody fundamental matrix. We prove that the epipoles of each independent motion lie exactly in the intersection of the left null space of the multibody fundamental matrix with the so-called Veronese surface. We then show that individual epipoles and epipolar lines can be uniformly and efficiently computed by using a novel polynomial factorization technique. Given the epipoles and epipolar lines, the estimation of individual fundamental matrices becomes a linear problem. Then, motion and feature point segmentation are automatically obtained from either the epipoles and epipolar lines or the individual fundamental matrices.
In [3] we present an algebraic geometric approach for segmenting both static and dynamic scenes from image intensities. We introduce the multibody affine constraint as a geometric relationship between the motion of multiple objects and the image intensities generated by them. This constraint is satisfied by all the pixels, regardless of the body to which they belong and regardless of depth discontinuities or perspective effects. We propose a polynomial factorization technique that estimates the number of affine motion models as well as their motion parameters in polynomial time. The factorization technique is used to initialize a nonlinear algorithm that minimizes the algebraic error defined by the multibody affine constraint. Our approach is based solely on image intensities, hence it does not require feature tracking or correspondences. It is therefore a natural generalization of the so-called direct methods in a single-body structure from motion to multiple moving objects.

Figure 1: Segmentation of three independently moving objects

Figure 2: Segmentation of two moving robots
Decisions are based on information. More often than not, decision-relevant information is a mixture of measurements and perceptions. For example, in deciding on whether or not to buy a house, what might be called measurement-based information pertains to the price of the house, the taxes, the mortgage rate, age, area, size of the lot, etc., while the perception-based information relates to the appearance of the house, the quality of construction, security, access to public transportation, etc.
Humans have a remarkable capability to make rational decisions based only on perceptions of values of decision variables, without any measurements and any computations. It is this capability that makes it possible for humans to drive in heavy traffic, play golf, park a car, and summarize a story. And it is this capability that the existing methods of decision analysis do not possess.
The incapability of existing methods of decision analysis to operate on perception-based information has a serious implication. More specifically, in most realistic settings, decision-relevant information includes probabilities of possible outcomes and future events. For example, in buying a house, an important consideration might be the probability that prices of houses will be rising in the months ahead. In general, decision-relevant probabilities are subjective and, in essence, are perceptions of likelihood. As perceptions, such probabilities are intrinsically imprecise. Computation with perception-based probabilities is an essential capability which the existing methods of decision analysis do not have.
As an illustration, consider a simple trip-planning problem, I have to fly from city A to city B and need to get there as soon as possible. There are two alternatives: (a) fly to D via B; and (b) fly to D via C. (a) will bring me to D earlier than (b). However, the connection time at B is short, and there is a likelihood that I may miss it, in which case I will have to take the next flight and will arrive in D much later. Neither I nor anyone else knows the numerical value of the probability that I may miss the connection. With this in mind, should I choose (a) or (b)? Decision analysis does not provide an answer to this question.
In perception-based decision analysis, or PDA for short, perception-based information is dealt with through the use of the computational theory of perceptions (CTP). In CTP, the point of departure is not a perception per se, but its description in a natural language. Thus, a perception is equated to its description. In this way, computation with perceptions is reduced to computation with propositions drawn from a natural language.
Reflecting the bounded ability of human sensory organs and, ultimately, the brain, to resolve detail and store information, perceptions are intrinsically imprecise. More specifically, perceptions are f-granular in the sense that (a) the boundaries of perceived classes are fuzzy; and (b) the values of perceived attributes are granulated, with a granule being a clump of values drawn together by indistinguishability, similarity, proximity or functionality. For example, the values of age as a granulated variable might be: young, middle-aged, old, very old, etc. Similarly, the values of probability might be: likely, not very likely, unlikely, etc.
F-granularity of perceptions puts them well beyond the expressive power of existing meaning-representation languages. In CTP, what is employed for meaning representation is precisiated natural language (PNL). A key idea in PNL is that the meaning of a proposition, p, drawn from a natural language, NL, may be represented as a generalized constraint of the form X isr R, where X is the constrained variable that is implicit in p; R is the implicit constraining relation; and r is an indexing variable which identifies the way in which R constrains X. The principal types of constraints are: possibilistic (r=blank); veristic (r=v); probabilistic (r=p); random set (r=rs); fuzzy graph (r=fg); and usuality (r=u). The constraint X isr R is referred to as a generalized constraint form, GC-form, and plays the role of a precisiation of p, p*. A proposition is precisiable if it has a GC-form. The language of GC-forms and their combinations is referred to as the Generalized Constraint Language, GCL. In relation to NL, GCL plays the role of a precisiation language.
Essentially, PNL is a sublanguage of NL which consists of propositions which are precisiable through translation from NL to GCL. More specifically, PNL is equipped with (a) a dictionary from NL to GCL in which the entries are (p, p*); (b) a dictionary from GCL to protoformal language (PFL), in which p* is paired with its abstracted form, e.g., QA's are B's, where p is: most Swedes are tall; and (c) a collection of protoformal rules of deduction which serves as a system for reasoning with perceptions through the use of goal-directed propagation of generalized constraints.
In PNL, perception-based probability distributions are represented as f-granular probability distributions expressed symbolically as P1 \ A1 + --- + Pn \ An, where the Ai are f-granular values of a random variable, X, and the Pi are f-granular values of their respective probabilities, that is, Pi=Prob{X is Ai}. For example, assume that I have to decide on whether to take an umbrella, given my perception that the f-granular probability distribution of rain is: P= low\no rain+low\light rain+high\moderate rain+low\heavy rain. A decision rule, then, may be: Take an umbrella if the perception of the probability distribution of rain is P. An important part of PDA is a collection of rules for computing with and ranking of f-granular probability distributions.
In spirit as well as in substance, perception-based decision analysis represents a radical departure from existing measurement-based methods. Basically, what PDA adds to decision analysis is a collection of concepts and techniques for basing decisions on perception-based information, with the understanding that measurement-based information is a special case.
A concomitant of progression from measurement-based to perception-based decision analysis is a shift from universal, normative decision principles to customized decision rules which are application-and context-dependent. This is an unavoidable consequence of striving for a rapprochment with the reality of making decisions in an environment of pervasive uncertainty, imprecision and partial truth.
* Professor in the Graduate School and Director, Berkeley Initiative in Soft Computing (BISC), Computer Science Division, Department of EECS, University of California, Berkeley, CA 94720-l776; Telephone: 5l0-642-4959; Fax: 5l0-642-l7l2; E-mail: zadeh@cs.berkeley.edu. Research supported in part by ONR N00014-00-1-0621, ONR Contract N00014-99-C-0298, NASA Contract NCC2-1006, NASA Grant NAC2-117, ONR Grant N00014-96-1-0556, ONR Grant FDN0014991035, ARO Grant DAAH 04-961-0341 and the BISC Program of UC Berkeley.
Existing search engines have many remarkable capabilities. But what is not among them is deduction capability--the capability to answer a query by drawing on information which resides in various parts of the knowledge base or is augmented by the user. An example of a simple query which cannot be answered by any existing search engine is: "How many UC Berkeley alumni were born in California?" In this case, the query and the query-relevant information are crisp. In the query "How many UC Berkeley alumni are wealthy?" the query-relevant information is crisp but the query is fuzzy. The problem--which is not widely recognized--is that much of the information in the knowledge base of a search engine is perception-based. Methods based on bivalent logic and standard probability theory lack capability to operate on perception-based information. A search engine with deduction capability is, in effect, a question-answering system.
Limited progress toward a realization of deduction capability is achievable through application of methods based on bivalent logic and standard probability theory. But to move beyond the reach of standard methods it is necessary to change direction. In the approach which is outlined, a concept which plays a pivotal role is that of a prototype--a concept which has a position of centrality in human reasoning, recognition, search and decision processes.
Informally, a prototype may be defined as a sigma-summary, that is, a summary of summaries. With this definition as the point of departure, a prototypical form, or protoform, for short, is defined as an abstracted prototype. As a simple example, the protoform of the proposition "Most Swedes are tall" is "QA's are B's," where Q is a fuzzy quantifier, and A and B are labels of fuzzy sets. Similarly, the protoform of "Usually Robert returns from work at about 6 pm," is "Prob (A) is B," where A is a fuzzy event and B is a fuzzy probability.
Abstraction has levels, just as summarization does. For example, in the case of "Most Swedes are tall," successive abstracted forms are "Most A's are tall," "Most A's are B's," and "QA's are B's."
At a specified level of abstraction, propositions are PF-equivalent if they have identical protoforms. For example, propositions "Usually Robert returns from work at about 6 pm" and "In winter, the average daily temperature in Berkeley is usually about fifteen degrees centigrade," are PF-equivalent. The importance of the concepts of protoform and PF-equivalence derives in large measure from the fact that they serve as a basis for knowledge compression.
A knowledge base is assumed to consist of a factual database, FDB, and a deduction database, DDB. In both FDB and DDB, knowledge is assumed to fall into two categories: (a) crisp and (b) fuzzy. Examples of crisp items of knowledge in FDB might be: “Height of the Eiffel tower is 324m” and “Paris is the capital of France.” Examples of fuzzy items might be “Most Swedes are tall,” and “California has a temperate climate.” Similarly, in DDB, an example of a crisp rule might be “If A and B are crisp convex sets, then their intersection is a crisp convex set.” An example of a fuzzy rule might be “If A and B are fuzzy convex sets, then their intersection is a fuzzy convex set.” A fuzzy rule may be a crisp assertion about fuzzy sets or a fuzzy assertion about crisp sets or a fuzzy assertion about fuzzy sets.
The deduction database is assumed to consist of a logical database and a computational database, with the rules of deduction having the structure of protoforms. An example of a computational rule is "If Q A's are B's and Q (A and B)'s are C's," then "Q Q A's are (B and C)'s,” where Q and Q are fuzzy quantifiers, and A, B and C are labels of fuzzy sets. The number of rules in the computational database is assumed to be very large in order to allow a chaining of rules that may be query-relevant.
A very simple example of deduction in the prototype-centered approach—an example which involves string matching but no chaining--is the following. Suppose that a query is “How many Swedes are very tall?” A protoform of this query is: ?Q A’s are 2B, where Q is a fuzzy quantifier and 2B is assumed to represent the meaning of “very B,” with the membership function of 2B being the square of the membership function of B. Searching DDB, we find the rule “If Q A’s are B then Q1/2 A’s are 2B,” whose consequent matches the query, with ?Q instantiated to Q1/2, A to “Swedes,” and B to “tall.” Furthermore, in FDB, we find the fact “Most Swedes are tall,” which matches the antecedent of the rule, with Q instantiated to “Most.” A to “Swedes” and B to “tall.” Consequently, the answer to the query is “Most1/2 Swedes are very tall,” where the membership function of “Most1/2” is the square root of Most in fuzzy arithmetic.
Another example of deduction is a simple version of what is known as the Robert example. Specifically, from the perception, p. “Usually Robert returns from work at about 6:00pm,” what is to be deduced is the answer to the question, q, “What is the probability that Robert is home at about t pm,” where t is a specified time.
In this case, the protoform of p is: Prob(A) is B, where A is an abstraction of the fuzzy event “Robert returns from work at about 6:00pm,” and B is an abstraction of the fuzzy probability “usually.” Similarly, the protoform of q is: Prob(C) is ?D, where C is an abstraction of the fuzzy event “Robert returns from work before about t pm,” and D is an abstraction of the probability which is the object of deduction.
An applicable deduction rule in DDB is
Prob(A) is B
____________
Prob(C) is D
in which D is the solution of a variational problem involving the membership function of A, B, C and D. The details of solution may be found in a forthcoming paper “Toward a Perception-Based Theory of Probabilistic Reasoning with Imprecise Probabilities,” to appear in the Journal of Statistical Planning and Inference.
The rules of deduction in the prototype-centered approach involve generalized constraint propagation, with generalized constraints represented as protoforms. Deduction is carried out through backtracking from query to query-relevant information. The concept of protoform has some links to the concept of ontology in ontological engineering, but unlike the latter is rooted in fuzzy logic and perception-based probability theory.
What should be underscored is that the problem of adding deduction capability to search engines is many-faceted and complex. It would be unrealistic to expect rapid progress toward its solution.
* Professor in the Graduate School and director, Berkeley initiative in Soft Computing (BISC), Computer Science Division and the Electronics Research Laboratory, Department of EECS, University of California, Berkeley, CA 94720-1776; Telephone: 510-642-4959; Fax: 510-642-1712; E-Mail: zadeh@cs.berkeley.edu . Research supported in part by ONR N00014-00-1-0621, ONR Contract N00014-99-C-0298, NASA Contract NCC2-1006, NASA Grant NAC2-117, ONR Grant N00014-96-1-0556, ONR Grant FDN0014991035, ARO Grant DAAH 04-961-0341 and the BISC Program of UC Berkeley.
Fuzzy systems, neural networks, and their combination in neuro-fuzzy systems are already well established in data analysis and system control. Neuro-fuzzy systems are especially well suited for the development of interactive data analysis tools, which enable the creation of rule-based knowledge from data and the introduction of a priori knowledge into the process of data analysis. However, its recurrent variants, especially recurrent neuro-fuzzy models, are still rarely used.
To simplify the definition of fuzzy systems, or to reduce its complexity, hierarchical and recurrent structures can be used. Thus, more transparent rule bases that are also easier to maintain can be designed. These structures also allow the use of time delayed input or reuse of time delayed output from the fuzzy system itself. Thus, we obtain a rule base that is able to describe dynamic behavior.
In this project we study the capabilities of hierarchical recurrent neuro-fuzzy models. Furthermore, we have developed a neuro-fuzzy model that can be used to learn and optimize hierarchical recurrent fuzzy rule bases from data.
In this project, we study approaches that combine visualization techniques for document collections with conventional keyword and content-based search methods. During the past period we focused on the development of interactive visualization techniques.
For the visualization of document collections and search results we developed models based on growing self-organizing maps. These maps can be used to arrange documents based on the similarity of the document's content [1]. The trained and labeled map can then be used to visualize the structure of the underlying document collection as well as changes in the collection, e.g., insertion or removal of document subsets [2]. Furthermore, visual information about document density, match of search hits to specific document groups, and similarity to a given sample document in content based searching can be given by different coloring methods. Besides for the analysis of text document collections, the developed models can also be applied for the analysis of multimedia data [3] or to post-process search engine result sets. The visualization and clustering of the obtained results provides additional visual information about the outcome of a search, which is more intuitive than a pure ordered list of search hits.
A further advantage of the developed approaches is that manually defined lists of index terms or a classification hierarchy, which are usually subjectively labeled and require expensive maintenance, are not needed. Especially, in rapidly changing document collections, such as collections of scientific research publications, classification systems that are not frequently updated and do not reflect the user’s classification criteria, are usually not accepted.
Basically, in computing with words and perceptions, or CWP for short, the objects of computation are words, propositions and perceptions described in a natural language. A very simple example is: Usually Robert returns from work at about 6 pm. What is the probability that Robert is home at about 6:l5 pm? Another elementary example is: A box contains about 20 balls of various sizes. Most are large. There are many more large balls than small balls. How many are neither large nor small?
In science, there is a deep-seated tradition of striving for progression from perceptions to measurements, and from the use of words to the use of numbers. Why and when, then, should the use of CWP be considered?
There are two principal rationales. The first, briefly stated, is: When precision is desired but the needed information is not available, in which case the use of CWP is a necessity rather than an option. And second, when precision is not needed, in which case the tolerance for imprecision may be exploited to achieve tractability, robustness, simplicity and low solution cost.
Another important point is that humans have a remarkable capability to perform a wide variety of physical and mental tasks without any measurements and any computations, e.g., parking a car, driving in city traffic, playing golf and summarizing a book. In performing such tasks, humans employ perception--rather than measurements--of distance, direction, speed, count, likelihood, intent and other attributes.
Reflecting the bounded ability of sensory organs and, ultimately, the brain, to resolve detail, perceptions are intrinsically imprecise. More concretely, perceptions are f-granular in the sense that (a) the perceived values of attributes are fuzzy; and (b) the perceived values of attributes are granular, with a granule being a clump of values drawn together by indistinguishability, similarity, proximity or functionality.
F-granularity of perceptions is the reason why in the enormous literature on perceptions—in fields ranging from linguistics and logic to psychology and neuroscience—one cannot find a theory in which perceptions are objects of computation, as they are in CWP. What should be stressed is that in dealing with perceptions, the point of departure in CWP is not a perception per se, but its description in a natural language. This is a key idea which makes it possible to reduce computation with perceptions to computation with propositions drawn from a natural language.
A related key idea in CWP is that the meaning of a proposition, p, in a natural language may be represented as a generalized constraint of the form X isr R, where X is a constrained variable which, in general, is implicit in p; R is the constraining relation which, like X,is in general implicit in p; and r is an indexing variable whose value identifies the way in which R constrains X. The principal types of constraints are: equality (r is =); possibilistic (r is blank); veristic (r is v); probabilistic (r is p); random set (r is rs); fuzzy graph (r is fg); usuality (r is u); and Pawlak set (r is ps). In this system of classification of constraints, the standard constraint, X belongs to C, where C is a crisp set, is possibilistic. Representation of p as a generalized constraint is the point of departure in what is called precisiated natural language (PNL).
PNL associates with a natural language, NL, a precisiation language, GCL (Generalized Constraint Language), which consists of generalized constraints and their combinations and qualifications. A simple example of an element of GCL is: (X is A) and (Y isu B). A proposition, p, in NL is precisiable if it is translatable into GCL. In effect, PNL is a sublanguage of NL which consists of propositions which are precisiable through translation into GCL. More concretely, PNL is associated with two dictionaries (a) from NL to GCL, and (b) from GCL to what is referred to as the protoform language (PFL). An element of PFL is an abstracted version of an element of GCL. The translates of p into GCL and PFL are denoted as GC(p) and PF(p ), respectively.
In addition, PNL is associated with a deduction database, DDB, which consists of rules of deduction expressed in PFL. An example of such a rule is the intersection/product syllogism: if Q A's are B's and R (A and B)'s are C's, then QR A's are (B and C)'s, where Q and R are fuzzy quantifiers, e.g., most, many, few; A, B and C are fuzzy sets, and QR is the product of Q and R in fuzzy arithmetic.
The principal function of PNL is to serve as a system for computation and reasoning with perceptions. A related function is that of serving as a definition language. In this capacity, PNL may be used to (a) define new concepts, e.g., the usual value of a random variable; and (b) redefine existing concepts, e.g., the concept of statistical independence. The need for redefinition arises because standard bivalent -classic-based definitions may lead to counterintuitive conclusions.
Computing with words and perceptions provides a basis for an important generalization of probability theory, namely, perception-based probability theory (PTp). The point of departure in PTp is the assumption that subjective probabilities are, basically, perceptions of likelihood. A key consequence of this assumption is that subjective probabilities are f-granular rather than numerical, as they are assumed to be in standard bivalent-logic-based probability theory, PT.
A basic concept in PTp is that of f-granular probability distribution, P*(X), of a random variable, X, with X taking values in a space U. Symbolically, P*(X) is expressed as P*(X)= P1\A1+…+Pn\An, in which the Ai are granules of X, and Pi is a perception of probability of the event X is Ai. For example, if X is the length of an object and its granules are labeled short, medium and long, then P*(X) may be expressed as P*(X) = low \ short+high \ medium+low \ long, where low and high are granular probabilities. Basically, PTp adds to PT the capability to operate on perception-based information—a capability which plays an especially important role in decision analysis. More specifically, in most realistic settings in decision analysis, a decision involves a ranking of f-granular probability distributions. Furthermore, employment of the principle of maximization of expected utility leads to a ranking of fuzzy numbers.
In the final analysis, the importance of CWP derives from the fact that it opens the door to adding to any measurement-based theory, T, the capability to operate on perception-based information. Conceptually, computationally and mathematically, the perception-based theory, Tp, is significantly more complex than T. In this instance, as in many others, complexity is the price that has to be paid to reduce the gap between theory and reality.
* Professor in the Graduate School and Director, Berkeley initiative in Soft Computing (BISC), Computer Science Division and the Electronics Research Laboratory, Department of EECS, University of California, Berkeley, CA 94720-1776; Telephone: 510-642-4959; Fax: 510-642-1712; E-Mail: zadeh@cs.berkeley.edu . Research supported in part by ONR N00014-00-1-0621, ONR Contract N00014-99-C-0298, NASA Contract NCC2-1006, NASA Grant NAC2-117, ONR Grant N00014-96-1-0556, ONR Grant FDN0014991035, ARO Grant DAAH 04-961-0341 and the BISC Program of UC Berkeley.
In science—and especially in mathematics—it is a universal practice to express definitions in a language based on bivalent logic. Thus, if C is a concept, then under its definition every object, u, is either an instance of C or it is not, with no shades of gray allowed. This deep-seated tradition—which is rooted in the principle of the excluded middle—is in conflict with reality. Furthermore, it rules out the possibility of graceful degradation, leading to counterintuitive conclusions in the spirit of the ancient Greek sorites paradox.
In fuzzy logic—in contrast to bivalent logic—everything is, or is allowed to be, a matter of degree. This is well known, but what is new is the possibility of employing the recently developed fuzzy-logic-based language precisiated natural language (PNL) as a concept-definition language to formulate definitions of concepts of the form “approximate X,” where X is a crisply defined bivalent-logic-based concept. For example, if X is the concept of a linear system, then “approximate X” would be a system that is approximately linear.
The machinery of PNL provides a basis for a far-reaching project aimed at associating with every—or almost every—crisply defined concept X a PNL-based definition of “approximate X,” with the understanding that “approximate X” is a fuzzy concept in the sense that every object x is associated with the degree to which x fits X, with the degree taking values in the unit interval or a partially ordered set. A crisp definition of “approximate X” is not acceptable because it would have the same problems as the crisp definition of X.
As a simple example, consider the concept of a linear system. Under the usual definition of linearity, no physical system is linear. On the other hand, every physical system may be viewed as being approximately linear to a degree. The question is: How can the degree be defined?
More concretely, assume that I want to get a linear amplifier, A, and that the deviation from linearity of A is described by the total harmonic distortion, h, as a function of power output, P. For a given h(P), then, the degree of linearity may be defined in the language of fuzzy if-then rules – a language which is a sublanguage of PNL. In effect, such a definition would associate with h(P) its grade of membership in the fuzzy set of distortion/power functions which are acceptable for my purposes. What is important to note is that the definition would be local, or, equivalently, context-dependent, in the sense of being tied to a particular application. What we see is that the standard, crisp, definition of linearity is global (universal, context-independent, objective), whereas the definition of approximate linearity is local (context-dependent, subjective). This is a basic difference between a crisp definition of X and PNL-based definition of “approximate X.” In effect, the loss of universality is the price which has to be paid to define a concept, C, in a way that enhances its rapport with reality.
In principle, with every crisply defined X we can associate a PNL-based definition of “approximate X.” Among the basic concepts for which this can be done are the concepts of stability, optimality, stationarity and statistical independence. But a really intriguing possibility is to formulate a PNL-based definition of “approximate theorem.” It is conceivable that in many realistic settings informative assertions about “approximate X” would of necessity have the form of “approximate theorems,” rather than theorems in the usual sense. This is one of the many basic issues which arise when we cross into the uncharted territory of approximate concepts defined via PNL.
A simple example of “approximate theorem” is an approximate version of Fermat’s theorem. More specifically, assume that the equality xn + yn = zn is replaced with approximate equality. Furthermore, assume that x, y, z are restricted to lie in the interval [I,N]. For a given n, the error, e(n), is defined as the minimum of a normalized value of | xn + yn - zn | over all allowable values of x, y, z. Observing the sequence {e(n)}, n = 3,4,…, we may form perceptions described as, say, “for almost all n the error is small;” or “the average error is small;” or whatever appears to have a high degree of truth. Such perceptions, which in effect are summaries of the behavior of e(n) as a function of n, may qualify to be called “approximate Fermat’s theorems.” It should be noted that in number theory there is a sizeable literature on approximate Diophantine equations. There are many deep theorems in this literature, all of which are theorems in the usual sense.
In a sense, an approximate theorem may be viewed as a description of a perception. The concept of a fuzzy theorem was mentioned in my 1975 paper “The Concept of a Linguistic Variable and its Application to Approximate Reasoning.” What was missing at the time was the concept of PNL.
* Professor in the Graduate School and director, Berkeley initiative in Soft Computing (BISC), Computer Science Division and the Electronics Research Laboratory, Department of EECS, Univeristy of California, Berkeley, CA 94720-1776; Telephone: 510-642-4959; Fax: 510-642-1712; E-Mail: zadeh@cs.berkeley.edu. Research supported in part by ONR Grant N00014-00-1-0621, ONR Contract N00014-99-C-0298, NASA Contract NCC2-1006, NASA Grant NAC2-117, ONR Grant N00014-96-1-0556, ONR Grant FDN0014991035, ARO Grant DAAH 04-961-0341 and the BISC Program of UC Berkeley.
As we move further into the realm of intelligent systems, the problem of devising performance metrics for assessing machine intelligence looms larger and larger in importance. The problem is there, but does it have a solution? A somewhat unorthodox view which is articulated in the following is that (a) complete solution is beyond the reach of existing methods; and (b) that a prerequisite to solving the problem is a better understanding of a broader problem, namely, the basic problem of concept definability. To this end, what is presented in the following is a sketch of what may be called a theory of hierarchical definability, or THD for short.
In science, and especially in natural sciences and mathematics, there is a long-standing tradition of expecting that concepts be defined clearly and precisely. But as we move from the natural sciences to the sciences of the artificial, two basic problems came into view.
The first problem relates to the need to formulate our definitions in ways that can be understood by a machine. For example, if I command a household robot to take the dishes off the table, I must define what I mean by “take the dishes off the table.” Or, if I instruct a machine to summarize a document, I must define what I mean by a summary. And, how can I assess the machine IQ (MIQ) of a machine that executes my commands?
The second problem is that we encounter, much more frequently than in the past, concepts which do not lend themselves to precise definition. Among familiar examples of such concepts are intelligence, creativity, autonomy, adaptivity, relevance, robustness, and causality.
We have been largely unsuccessful in formulating operational definitions of concepts of this nature. Why?
A view that is advanced in the following is that the primary reason for the lack of success is that the concepts in question, and many like them, are intrinsically fuzzy, that is, are a matter of degree. Thus, when we try to define such concepts within the conceptual framework of classical, bivalent logic, we encounter a fundamental incompatibility—an incompatibility between crispness of definitions and fuzziness of the concepts we try to define.
Viewed in a slightly different perspective, the problem relates to the inadequate expressive power of the definition languages which are at our disposal, namely, the natural language and the language of bivalent logic. What this implies is that, to solve the problem, we have to add languages with higher expressive power to our repertoire of definition languages. This is the basic idea that underlies the theory of hierarchical definability, THD.
In THD, the languages that we add are based on fuzzy logic since they must be capable of serving as definition languages for fuzzy concepts. More specifically, the definition languages in THD form a hierarchy represented as (NL, C, F, F.G, PNL), where NL is the lowest member in terms of expressive power, and precisiated natural language (PNL) is the highest. It is understood that every member of the hierarchy subsumes those below it.
The C definition language is the language of mathematical analysis, probability theory, and bivalent logic. This is the language that we learn when we take courses in mathematics, probability theory and logic. The F language is the language of fuzzy logic without granulation, and the F.G language is the language of fuzzy logic with granulation. PNL is a fuzzy-logic-based language with maximal expressive power.
A simple analogy may be of help. In my progression of learning, I start with my knowledge of a natural language. After entering a university and taking courses in mathematics, I add to NL my knowledge of C. At this stage, I can use the union of NL and C as a definition language. Then, I take a course in fuzzy logic. In this course, first I learn F, then F.G and finally PNL. At the end, I can use PNL as a definition language, with the understanding that PNL subsumes all languages below it in the hierarchy.
What is PNL? The basic idea in PNL is that a proposition, p, in a natural language, NL, may be precisiated through translation into a precisiation language. In the case of PNL, the precisiation language is the generalized constraint language (GCL). A generic generalized constraint is represented as Z isr R, where Z is the constrained variable, R is the constraining relation and r is a discrete-valued indexing variable whose values define the ways in which R constrains Z. The principal types of constraints are: possibilistic (r=blank); veristic (r=v); probabilistic (r=p); random set (r=rs); usuality (r=u); fuzzy graph (r=fg); and Pawlak set (r=ps). The rationale for constructing a large variety of constraints is that conventional crisp constraints are incapable of representing the meaning of propositions expressed in a natural language—most of which are intrinsically imprecise—in a form that lends itself to computation.
The elements of GCL are composite generalized constraints that are formed from generic generalized constraints by combination, modification, and qualification. An example of a generalized constraint in GCL is ((Z isp R) and (Z, Y) is S) is unlikely.
By construction, the generalized constraint language is maximally expressive. What this implies is that PNL is the largest subset of a natural language that admits precisiation. Informally, this implication serves as a basis for the conclusion that if a concept, X, cannot be defined in terms of PNL, then, in effect, it is undefinable or, synonymously, amorphic.
In this perspective, the highest level of definability hierarchy, which is the level above PNL-definability, is that of undefinability or amorphicity. A canonical example of amorphic concepts is that of causality. More specifically, it is not possible to construct a general definition of causality such that given any two events A and B and the question, “Did A cause B?” the question could be answered based on the definition. Equivalently, given any definition of causality, it will always be possible to construct examples to which the definition would not apply or yield counterintuitive results.
In dealing with an amorphic concept, X, what is possible—and what we generally do—is to restrict the domain of applicability of X to instances for which X is definable. For example, in the case of the concept of a summary, which is an amorphic concept, we could restrict the length, type, and other attributes of what we want to summarize. In this sense, an amorphic concept may be partially definable or, p-definable, for short. The concept of p-definability applies to all levels of the definability hierarchy.
In essence, PNL may be viewed as a collection of ordered pairs of the form (p, px), where p is a precisiable proposition in NL and p x is a precisiation of p, that is, its translation in GCL. In this sense, PNL may be viewed as a dictionary in which p is an entry and px is its meaning.
In scientific theories, a concept, X, is almost always defined as a crisp (bivalent) concept, meaning that the denotation of X is a crisp set in its universe of discourse. In THD, a concept, X, is associated with a quintuple (X, U, QCS, DF(L), D(DF)) in which X is the concept; U is the space of objects to which X is applicable; QCS is the qualitative complexity scale associated with X; DF(L) is a definition of X in a language L; and D(DF) is the domain of DF, that is, the set of objects to which DF is applicable.
The concept of a qualitative complexity scale plays a key role in THD. Basically, the qualitative complexity scale, QCS, is a linear clustering, QCC1, QCC2, …, QCCm, of qualitative complexity classes of objects in U such that: (a) objects in QCCi are roughly equally complex in relation to the definition, DF, of X; and (b) objects in QCCi+1 have higher complexity than those in Qi. For example, if X is the concept of volume, then QCC2 may be class of objects like trees; and QCC5 may be the class of objects like clothing. Each language in the definability hierarchy is associated with a critical threshold on the qualitative complexity scale such that the language cannot be applied to classes above the critical threshold.
As the lowest member of the definability hierarchy, the C language has a low expressive power, with the consequence that the associated critical threshold is near the low end of the of the qualitative complexity scale. In particular, the C language cannot be used to define fuzzy concepts. Thus, its use to define concepts which, in reality, are fuzzy concepts, leads to counterintuitive conclusions. An example is the conventional C-language-based definition of stability. Since stability, in general, is a matter of degree, its definition as a crisp concept leads to paradoxes similar to the ancient Greek sorites paradox. To define stability as a fuzzy concept—which in reality it is—what is needed is PNL. The same applies to the concept of causality. Thus, causality can be defined as a crisp concept only for complexity classes which lie close to the low end of the qualitative complexity scale.
Another important point is that almost every concept has some degree of amorphicity, with a concept such as causality being amorphic to a high degree. But even such basic concepts as volume, density, edge, derivative and optimality have domains of amorphicity which are apparent in many real-world settings. What this implies is that many basic concepts may require redefinition in terms of PNL.
Does PNL have a significant role to play in devising metrics of performance for intelligent systems? This is an issue that is not addressed in the brief sketch of the theory of hierarchical definability. But I have no doubt that it will, since the concept of intelligence is much too complex to lend itself to analysis through the use of existing bivalent-logic-based methods.
* Professor in the Graduate School and Director, Berkeley Initiative in Soft Computing (BISC), Computer Science Division and the Electronics Research Laboratory, Department of EECS, University of California, Berkeley, CA 94720-1776; Telephone: 510-642-4959; Fax: 510-642-1712; E-mail: zadeh@cs.berkeley.edu. Research supported in part by ONR Contract N00014-00-1-0621, ONR Contract N00014-99-C-0298, NASA Contract NCC2-1006, NASA Grant NAC2-117, ONR Grant N00014-96-1-0556, ONR Grant FDN0014991035, ARO Grant DAAH 04-961-0341 and the BISC Program of UC Berkeley.
Almost all scientific theories are based on Aristotelian logic and probability theory. Use of these theories has led to brilliant successes that are visible to all. But what is less visible is that alongside the brilliant successes lie many problem areas where progress has been slow or nonexistent. We cannot automate driving in city traffic, we cannot construct programs that can summarize non-stereotypical stories, and our capacity to do economic forecasting without human intervention leaves much to be desired.
To understand the reasons for the mixed picture, it is germane to start with the thesis that, in general, a system for performing a specified task may be viewed as having three principal components: (1) hardware; (2) software; and (3) what may be called brainware--a complex of concepts, theories, methods, and algorithms that govern the functioning of hardware and software.
The capability of a system to perform a specified task is a function of the capabilities of its hardware, software, and brainware components. In general, there is a tradeoff between these capabilities. However, there are important classes of problems in which the overall capability is limited primarily by the structure of brainware and, more particularly, by the underlying systems for logical reasoning and dealing with uncertainty.
What may be called the Intractability Principle is a thesis that is focused on problems that are solvable by humans, but are difficult for machines. More specifically, the principle relates machine-solvability to the structure of brainware. A basic concept that underlies the principle is that of a qualitative complexity scale, or QCS for short. The low end of QCS consists of problem/tasks that are easy to solve/perform. The opposite holds for the high end. An M-scale is a QCS that applies to machines; an H-scale applies to humans. For simplicity, QCS is assumed to be linearly ordered; it can be partially ordered, if needed. An example of M-scale is the automation of driving a car. At the low end is the problem of automation of driving on a freeway with no traffic. At the high end is the automation of driving in Istanbul.
Stated informally, the thesis has two parts, (1) negative, and (2) positive, which are summarized in the following. (1) As the complexity of a problem increases, a critical threshold is reached beyond which a solution cannot be achieved through the use of techniques based on two-valued logical systems and probability theory. On an M-scale, problems that lie to the right of the critical threshold are said to be Z-hard. (2) Beyond the critical threshold, achievement of a solution necessitates the use of fuzzy-logic-based methodology of computing with words (CW) and the perception-based theory of probabilistic reasoning (PTp).
The basic idea underlying the Intractability Principle is that problems that lie beyond the critical threshold are intractable by conventional measurement-based methods. What is not widely recognized is that many problems that are simple for humans fall into this category. Here are a few examples of Z-hard problems.
(1) Automation of driving a car. In this case, as stated already, on one end of the complexity scale we have automation of driving on a freeway with no traffic. On the other end is automation of driving in Istanbul. Humans can drive in Istanbul without any measurements or any computations. At this juncture, no conceivable system can solve the problem of automation of driving in Istanbul.
(2) Machine-execution of the instruction: Make a right turn at the intersection. In this case, at one end of the scale we have an intersection on the outskirts of Princeton. On the other end, an intersection in the center of New York.
(3) Summarization. On one end of the scale is a short stereotypical story. On the other end is a book. As a task, summarization is an order of magnitude more complex than machine translation. In (1), (2), and (3), problems at the end of the scale are Z-hard.
There are two key points that require comment. First, in most real-world problems that are simple for humans, the critical threshold is not a remote limit that is of no concern. On the contrary, it underscores the intrinsic limitation of techniques based on Aristotelian logic and probability theory to deal with problems in which decision-relevant information is, for the most part, perception-based.
The second point relates to the pivotal role of the methodology of computing with words in making it possible to solve problems that lie beyond the critical threshold. The crux of the matter is that in most problems that lie beyond the critical threshold, perception-based information is described by propositions expressed in a natural language, e.g., "traffic is usually very heavy in late afternoon." In general, such propositions are f-granular in the sense that (1) the boundaries of perceived classes are unsharp; and (2) the values of attributes are granulated, with a granule being a clump of values drawn together by indistinguishability, similarity, proximity, and functionality. The problem with theories based on Aristotelian logic and probability theory is that they do not have the capability to deal with f-granular propositions drawn from a natural language. In large measure, the importance of computing with words derives from the fact that it does provide this capability.
A key concept in computing with words is that of Precisiated Natural Language (PNL). Basically, PNL is a subset of a natural language, NL, which consists of propositions that are precisiable through translation into what is called the Generalized Constraint Language, GCL. PNL serves an important function as a definition language, with the language of fuzzy if-then rules being a sublanguage of PNL.
Computing with words provides a basis for an important generalization of probability theory--from standard probability theory, PT, to perception-based probability theory, PTp. Separately and in combination, computing with words and perception-based theory of probabilistic reasoning open the door to a solution of problems that lie beyond the critical threshold. This is the principal rationale for the positive thesis of the Intractability Principle.
* Professor in the Graduate School and Director, Berkeley Initiative in Soft Computing (BISC), Computer Science Division, Department of EECS, University of California, Berkeley, CA 94720-l776; Telephone: 5l0-642-4959; Fax: 5l0-642-l7l2; E-mail: zadeh@cs.berkeley.edu. Research supported in part by ONR Contract N00014-99-C-0298, NASA Contract NCC2-1006, NASA Grant NAC2-1l7, ONR Grant NOOO14-96-1-0556, ONR Grant FDNOO14991035, ARO Grant DAAH 04-961-0341, and the BISC Program of UC Berkeley.
Probability theory has long played—and is continuing to play—the role of the principal tool for dealing with problems in which uncertainty is a significant factor. The history of probability theory is a history of important advances in our understanding of decision-making in uncertain environments. But what we see alongside the brilliant successes are problem areas in which progress has been elusive. An example is the class of problems in which probabilities, utilities and relations are ill-defined in ways that put such problems well beyond the reach of existing methods.
A thesis that is put forth in this paper is that standard probability theory, call it PT, has fundamental limitations—limitations which are rooted in the fact that PT is based on bivalent logic. It is this thesis that underlies the radically-sounding title of the paper: It is a Fundamental Limitation to Base Probability Theory on Bivalent Logic.
The principal rationale for this thesis is that the conceptual structure of PT is directed at addressing the partiality of certainty but not the partiality of possibility and, most importantly, the partiality of truth. A direct consequence of these omissions is that PT lacks the capability to deal with information which is not just partially certain but also partially possible and/or partially true. An example is: Most analysts believe that it is very unlikely that there will be a significant increase in the price of oil in the near future.
The principal negative consequences of basing PT on bivalent logic are the following.
(1) Brittleness (discontinuity.) Almost all concepts in PT are bivalent in the sense that a concept, C, is either true or false, with no partiality of truth allowed. For example, events A and B are either independent or not independent. A process, P, is either stationary or nonstationary, and so on. An example of brittleness is: If all A’s are B’s and all B’s are C’s, then all A’s are C’s; but if almost all A’s are B’s and almost all B’s are C’s, then all that can be said is that the proportion of A’s in C’s is between 0 and 1. In particular, brittleness of independence creates serious problems in construction of Bayesian nets since only an epsilon separates independence from dependence.
(2) The dilemma of “it is possible but not probable.” A simple version of this dilemma is the following. Assume that A is a proper subset of B and that the Lebesgue measure of A is arbitrarily close to the Lebesgue measure of B. Now, what can be said about the probability measure, P(A), given the probability measure P(B)? The only assertion that can be made is that P(A) lies between 0 and P(B). The uniformativeness of this assessment of P(A) leads to counterintuitive conclusions. For example, suppose that with probability 0.99 Robert returns from work within one minute of 6pm. What is the probability that he is home at 6pm? Using PT, with no additional information or the use of the maximum entropy principle, the answer is: between 0 and 1. This simple example is an instance of a basic problem of what to do when we know what is possible but cannot assess the associated probabilities or probability distributions. A case in point relates to assessment of the probability of a worst case scenario.
(3) Incapability to operate on perception-based information. This is the most serious limitation of PT. It is rooted in the fact that much of human knowledge is perception-based, e.g., “Most Swedes are tall;” “It is foggy in San Francisco during the summer,” “It is unlikely to be warm tomorrow.” and “Usually Robert returns from work at about 6pm.”
Perceptions are intrinsically imprecise, reflecting the bounded ability of sensory organs and, ultimately, the brain, to resolve detail and store information. More concretely, perceptions are f-granular in the sense that (a) the boundaries of perceived classes are unsharp; and (b) the values of perceived attributes are granulated, with a granule being a clump of values drawn together by indistinguishability, similarity, proximity or functionality.
F-granularity of perceptions puts them well beyond the reach of conventional meaning-representation methods based on bivalent logic. As a consequence, PT, by itself, lacks the capability to operate on perception-based information: As an illustration, PT cannot provide an answer to the query: What is the probability that Robert is home at about t pm, given the perception-based information that (a) usually Robert leaves his office at about 5:30 pm; and (b) usually it takes about 30 minutes to reach home.
What can be done to endow probability theory with the capability to operate on perception-based information? In a recent paper entitled “Toward a Perception-Based Theory of Probabilistic Reasoning with Imprecise Probabilities,” (Journal of Statistical Planning and Inference), I outlined an approach to a restructuring of probability theory which adds this capability to PT. Briefly, the proposed restructuring involves three stages of generalization: (a) f-generalization; (b) f.g-generalization; and (c) nl-generalization. More specifically:
(a) F-generalization involves a progression from crisp sets to fuzzy sets in PT, leading to a generalization of PT which is denoted as PT+. In PT+, probabilities, functions, relations, measures and everything else are allowed to have fuzzy denotations, that is, be a matter of degree. In particular, probabilities described as low, high, not very high, etc. are interpreted as labels of fuzzy subsets of the unit interval or, equivalently, as possibility distributions of their numerical values.
(b) F.g-generalization involves fuzzy granulation of variables, functions, relations, etc., leading to a generalization of PT which is denoted as PT++. By fuzzy granulation of a variable, X is meant a partition of the range of X into fuzzy granules, with a granule being a clump of values of X which are drawn together by indistinguishability, similarity, proximity or functionality. Membership functions of such granules are usually assumed to be triangular or trapezoidal. Basically, granularity reflects the bounded ability of the human mind to resolve detail and store information.
(c) Nl-generalization involves an addition to PT++ of a capability to operate on propositions expressed in a natural language, with the understanding that such propositions serve as descriptors of perceptions. Nl-generalization of PT leads to perception-based probability theory denoted as PTp. By construction, PTp is needed to answer the query in the Robert example: What is the probability that Robert is home at about t pm?
The principal thrust of what was said above may be viewed as a call for a recognition that standard probability theory has serious limitations—limitations which are rooted in its bivalent base and the fundamental conflict between bivalence and reality. What is needed to circumvent these limitations is a restructuring of the foundations of probability theory—a restructuring aimed principally at endowing probability theory with the capability to operate on perception-based information.
* Professor in the Graduate School and Director, Berkeley Initiative in Soft Computing (BISC), Computer Science Division and the Electronics Research Laboratory, Department of EECS, University of California, Berkeley, CA 94720-1776; Telephone: 510-642-4959; Fax: 510-642-1712; E-mail: zadeh@cs.berkeley.edu. Research supported in part by ONR Contract N00014-00-1-0621, ONR Contract N00014-99-C-0298, NASA Contract NCC2-1006, NASA Grant NAC2-117, ONR Grant N00014-96-1-0556, ONR Grant FDN0014991035, ARO Grant DAAH 04-961-0341 and the BISC Program of UC Berkeley.
This project describes the need for an initiative to design an intelligent search engine based on two main motivations:
The web environment is, for the most part, unstructured and imprecise. To deal with information in the web environment, we need a logic that supports modes of reasoning that are approximate rather than exact. While searches may retrieve thousands of hits, finding decision-relevant and query-relevant information in an imprecise environment is a challenging problem, which has to be addressed. Another less obvious issue is deduction in an unstructured and imprecise environment given the huge stream of complex information.
As a result, intelligent search engines with growing complexity and technological challenges are currently being developed. This requires new technology in terms of understanding, development, engineering design, and visualization. While the technological expertise of each component becomes increasingly complex, there is a need for better integration of each component into a global model adequately capturing the imprecision and deduction capabilities.
The initiative will bring an integrated approach to perception-based information processing and retrieval, including intelligent search engines by integrating the various components and achievements of its team members. The objective of this initiative is to develop an intelligent computer system with deductive capabilities to conceptually match and rank pages based on predefined linguistic formulations and rules defined by experts or based on a set of known homepages. The Conceptual Fuzzy Set (CFS) model will be used for intelligent information and knowledge retrieval through conceptual matching of both text and images (here defined as "Concept"). The selected query doesn't need to match the decision criteria exactly, which gives the system a more human-like behavior. The CFS can also be used for constructing fuzzy ontology or terms related to the context of search or query to resolve the ambiguity. We intend to combine the expert knowledge with soft computing tools of Berkeley groups. Expert knowledge needs to be partially converted into artificial intelligence that can better handle the huge information stream. In addition, sophisticated management work-flow needs to be designed to make optimal use of this information. We believe our current team is unique in the world of perception-based information processing and analysis in tackling this problem. We intend no less than changing the face and practice of the intelligent search engines for complex unstructured dynamic systems and imprecise environments such as the Internet. The new model can execute conceptual matching dealing with context-dependent word ambiguity and produce results in a format that permits the user to interact dynamically to customize and personalized its search strategy.
This project describes the need for an initiative for intelligent real-time automated decision-making and management based on two main motivations.
First, in recent years, a decline in revenue, need for more cost effective strategy, and multicriteria and multiattribute optimization in an imprecise and uncertain environment, have emphasized the need for risk and uncertainty management in the distributed and complex dynamic systems. Second, there exists an ever-increasing need to improve technology that provides a global solution to modeling, understanding, analyzing, and managing imprecision and risk in real-time automated decision-making for complex distributed dynamic systems.
As a result, intelligent distributed dynamic systems with growing complexity and technological challenges are currently being developed. This requires new technology in terms of development, engineering design, and virtual simulation models. Each of these components adds to the global sum of uncertainty about risk during the decision making process. While the technological expertise of each component becomes increasingly complex, there is a need for better integration of each component into a global model, adequately capturing the uncertainty on key strategic parameters. The uncertainty quantification on such key parameters is required in any type of decision analysis.
The initiative will bring an integrated approach to real-time automated and intelligent management decision making by integrating the various components and achievements of its team members. As the turnaround time for a decision and management teams becomes increasingly shorter, management decisions on new lines of product or service or to find a new alternative solution become increasingly complex, given the huge stream of often imprecise information. We intend to combine the expert knowledge with soft computing tools of UC Berkeley groups. Expert knowledge is important in decision making and management, but is becoming increasingly complex, time consuming, and expensive. Moreover, expertise from various disciplines is required and needs to be integrated to provide a global solution.
Therefore, expert knowledge needs to be partially converted into artificial intelligence that can better handle the huge information stream and management in making more sound decisions. In addition, sophisticated decision making and management work-flow need to be designed to make optimal use of this information. We believe our current team is unique in the world of intelligent decision making in tackling this problem. We intend no less than changing the face and practice of the real-time automated decision making process for complex dynamic systems.
The BISC decision support system key features are:
BISC-DSS and Autonomous Multi-Agent System:
A key component of any autonomous multi-agent system--especially in an adversarial setting--is the decision module, which should be capable of functioning in an environment of imprecision, uncertainty, and imperfect reliability. BISC-DSS will be focused on the development of such a system and can be used as a decision-support system for ranking of decision alternatives. BISC-DSS can be used:
Other Potential Applications:
The areas in which the methodology could be implemented are diverse. Following are possible potentials outside the autonomous multi-agent system.
BISC-DSS can be integrated into TraS toolbox to develop the Intelligent Tracking System (ITraS). Given the information about suspicious activities such as phone calls, emails, meetings, credit card information, hotel, and airline reservations that are stored in a database containing the originator, recipient, locations, times, etc., we can use BISC-DSS and visual data mining to find suspicious patterns in data using geographical maps. The technology developed can detect unusual patterns, raise alarms based on classification of activities, and offer explanations based on automatic learning techniques for why a certain activity is placed in a particular class such as "safe," "suspicious," "dangerous," etc. The underlying techniques can combine expert knowledge and data-driven rules to continually improve its classification and adapt to dynamic changes in data and expert knowledge.
BISC-DSS can be integrated into a fuzzy conceptual set toolbox to develop a new tool for intelligent knowledge management and discovery (TIKManD). The model can be used to recognize terrorism activities through data fusion and mining, and pattern recognition technology, given online textual information through email or homepages and voice information given the wire tapping and/or chat lines or huge number of "tips" received immediately after the attack.
Given the ambiguity and imprecision of the "concept" in the Internet, which may be described with both textual and image information, the use of Fuzzy Conceptual Matching (FCM) is a necessity for search engines. In the FCM approach, the "concept" is defined by a series of keywords with different weights depending on the importance of each keyword. Ambiguity in concepts can be defined by a set of imprecise concepts. Each imprecise concept, in fact, can be defined by a set of fuzzy concepts. The fuzzy concepts can then be related to a set of imprecise words given the context. Imprecise words can then be translated into precise words given the ontology and ambiguity resolution through a clarification dialog. By constructing the ontology and fine-tuning the strength of links (weights), we could construct a fuzzy set to integrate piecewise the imprecise concepts and precise words to define the ambiguous concept.
In this project, first we will present the role of fuzzy logic in the Internet. Then we will present an intelligent model that can mine the Internet to conceptually match and rank homepages based on predefined linguistic formulations and rules defined by experts or based on a set of known homepages. The FCM model will be used for intelligent information and knowledge retrieval through conceptual matching of both text and images (here defined as "concept"). The FCM can also be used for constructing fuzzy ontology or terms related to the context of the query and search to resolve the ambiguity.
Ranking/scoring is used to make billions of financing decisions each year serving an industry worth hundreds of billions of dollars. To a lesser extent, hundreds of millions of applications were processed by US universities, resulting in over 15 million college enrollments in 2000, and a total revenue/expenditure of over $250 billion. College enrollments are expected to reach over 17 million by year 2010, a total revenue/expenditure of over $280 billion.
Credit scoring was first developed in the 1950s and has been used extensively in the last two decades. In the early 1980s, the three major credit bureaus, Equitax, Experian, and TransUnion, worked with the Fair Isaac Company to develop generic scoring models that allow each bureau to offer an individual score based on the contents of the credit bureau's data. FICO is used to make billions of financing decisions each year, serving a hundred-billion-dollar industry. Credit scoring is a statistical method to assess an individual's credit worthiness and the likelihood that the individual will repay his/her loans based on his/her credit history and current credit accounts. The credit report is a snapshot of the credit history and the credit score is a snapshot of the risk at a particular point in time. Since 1995, scoring has made its biggest contribution in the world of mortgage lending. Mortgage investors such as Freddie Mac and Fannie Mae, the two main government-chartered companies that purchase billions of dollars of newly originated home loans annually, endorsed Fair Isaac credit bureau risk, ignoring subjective consideration, and agreed that lenders should also focus on other outside factors when making a decision.
When you apply for credit, whether it's a new credit card, car, student loan, mortgage, or financing, about 40 pieces of information from your credit card report are fed into the model. That model provides a numerical score designed to predict your risk as a borrower. When you apply for university/college admission, more than 20 pieces of information from your application are fed into the model. That model provides a numerical score designed to predict your success rate and risk as a student to be admitted.
In this project, we will introduce fuzzy query and ranking to predict the risk in an ever-changing world and imprecise environment, including subjective consideration for several applications including credit scoring, university admission, and locating your favorite restaurant. Fuzzy query and ranking is robust, provides better insight and a bigger picture, contains more intelligence about an underlying pattern in data, and provides the ability of flexible querying and intelligent searching. This greater insight makes it easy for users to evaluate the results related to the stated criterion and makes a decision faster with improved confidence. It is very useful for multiple criteria, and when users want to vary each criterion independently with different confidence degrees or weighting factors. In the case of crisp queries, we can make multi-criterion decisions and ranking where we use the function AND and OR to aggregate the predicates. In the extended Boolean model or fuzzy logic, one can interpret the AND as fuzzy-MIN and OR as fuzzy-MAX functions. Fuzzy querying and ranking is a very flexible tool in which linguistic concepts can be used in the queries and ranking in a very natural form. In addition, the selected objects do not need to match the decision criteria exactly, which gives the system a more human-like behavior. Incorporating an electronic intelligent knowledge-based search engine, the results will be in a format that permits the user to interact dynamically with the contained database and to customize and add information to the database. For instance, it will be possible to test an intuitive concept by dynamic interaction between software and the human mind. This will provide the ability to answer "what if?" questions in order to decrease uncertainty and provide a better risk analysis and, for example, to increase the chance for either admission or increasing score.
This project describes the need for an initiative in reservoir modeling and management based on two main motivations:
As a result, reservoirs with growing geological and economical challenges are currently being explored and produced. This requires the application of new technologies and intelligent data and knowledge integration methods to achieve necessary efficiency and E&P cost reduction.
While the technological expertise of each component (geology, geophysics, reservoir engineering) becomes increasingly interrelated, there is a need for better integration of different components into a comprehensive reservoir model, adequately capturing the uncertainty on key strategic reservoir parameters. The uncertainty quantification on such key parameters is required in any type of decision analysis.
The initiative will bring an integrated approach to reservoir modeling and reservoir management decision making by integrating the various components and achievements of its team members. As the turnaround time for a reservoir modeling team becomes increasingly shorter, management decisions on new wells or new fields become increasingly complex, given the huge stream of often imprecise and uncertain information.
We intend to combine the reservoir expert knowledge of the Stanford groups with soft computing tools of UC Berkeley groups. Expert knowledge is important in reservoir management, but is becoming increasingly complex, time consuming, and expensive. Moreover, expertise from various disciplines is required and needs to be integrated to provide a global solution.
Therefore, expert knowledge needs to be partially converted into artificial intelligence that can better handle the huge information stream and partially automate the process of modeling and decision-making. Integration of knowledge developed by each group will lead to new systems and work-flows for better, faster, and more efficient reservoir management in terms of decision making.
Decision making under uncertainty can only work if each uncertainty component has been properly accounted for in a global solution to reservoir modeling and management. We believe our current team is unique in the world of reservoir modeling in tackling this problem.
A New Direction in AI--Toward a Computational Theory of Perceptions:
Reflecting the bounded ability of the human brain to resolve detail, perceptions are intrinsically imprecise. In more concrete terms, perceptions are f-granular, meaning that (1) the boundaries of perceived classes are unsharp; and (2) the values of attributes are granulated, with a granule being a clump of values (points, objects) drawn together by indistinguishability, similarity, proximity, and functionality. F-granularity of perceptions puts them well beyond the reach of traditional methods of analysis based on predicate logic and/or probability theory. The computational theory of perceptions (CTP) that is outlined here adds to the armamentarium of AI a capability to compute and reason with perception-based information. The point of departure in CTP is the assumption that perceptions are described by propositions drawn from a natural language. In CTP, a proposition, p, is viewed as an answer to a question and the meaning of p is represented as a generalized constraint. To compute with perceptions, their descriptors are translated into what is called the Generalized Constraint Language (GCL). Then, a goal-directed constraint propagation is employed to answer a give query. A concept that plays a key role in CTP is that of precisiated natural language (PNL).
The computational theory of perceptions suggests a new direction in AI--a direction that may enhance the ability of AI to deal with real-world problems in which decision-relevant information is a mixture of measurements and perceptions. What is not widely recognized is that many important problems in AI fall into this category.
Precisiated Natural Language (PNL):
It is a deep-seated tradition in science to view the use of natural languages in scientific theories as a manifestation of mathematical immaturity. The rationale for this tradition is that natural languages are lacking in precision. However, what is not widely recognized is that adherence to this tradition carries a steep price--the inability to exploit the richness of natural languages in a way that lends itself to computation and automated reasoning.
In a significant departure from existing methods, the high expressive power of natural languages is harnessed by a process termed precisiation. In essence, if p is proposition in a natural language (NL), then precisiation of p results in a representation of the meaning of p in the form of what is referred to as a generalized constraint. With these constraints serving as basic building blocks, composite generalized constraints can be generated by combination, constraint propagation, modification, and qualification. The set of all composite generalized constraints and associated rules of generation and interpretation constitute the Generalized Constraint Languages (GCL). Translation from NL to GLC is governed by the constraint-centered semantics of natural languages (CSNL). Thus, through CSNL, GCL serves as precisiation language for NL. Precisiation Natural Language (PNL) is a subset of NL, which is equipped with constraint-centered semantics and is translatable into GLC. By construction, GCL is maximally expressive. In consequence, PNL is the largest subset of NL, which admits precisiation. The expressive power of PNL is far greater than that of conventional predicate-logic-based meaning-representation languages.
The concept of PNL opens the door to a significant enlargement of the role of natural languages in scientific theories and, especially, in information processing, decision, and control. In these and other realms, a particularly important function that PNL can serve is that of a concept definition language--a language that makes it possible to formulate precise definitions of new concepts and redefine those existing concepts that do not provide a good fit to reality.
Perception-Based Decision Analysis (PDA):
Decisions are based on information. More often than not, the information available is a mixture of measurements and perception. The problem with perceptions is that they are intrinsically imprecise. A concept that plays a key role in perception-based decision analysis is that of precisiated natural language, PNL. In PDA, precisiated natural language is employed to define the goals, constraints, relations, and decision-relevant information. An important sublanguage of PNL is the language of fuzzy if-then rules, FRL. In this language, a perception of a function, f, is described by a collection of fuzzy if-then rules. Such a collection is referred to as the fuzzy graph of function f. Employment of PNL in decision analysis adds an important capability--a capability to operate on perception-based information. The capability has the effect of substantially enhancing the ability of DA to deal with real-world problems. More fundamentally, the high expressive power of PNL opens the door to a redefinition of such basic concepts as optimality and causality. Perception-based decision analysis represents a significant change in direction in the evolution of decision analysis. As we move farther into the age of machine intelligence and automation of reasoning, the need for a shift from computing with numbers to computing with words, and from manipulation of measurements to manipulation of perceptions, will cease to be a matter of debate.
From Computing with Numbers to Computing with Words to Computation with Perceptions--A Paradigm Shift:
The theory put forth in this research is focused on the development of what is referred to as the computational theory of perceptions (CTP)--a theory that comprises a conceptual framework and a methodology for computing and reasoning with perceptions. The base for CTP is the methodology of computing with words (CW). In CW, the objects of computation are words and propositions drawn from a natural language. The point of departure in the computational theory of perceptions is the assumption that perceptions are described as propositions in a natural language. Furthermore, computing and reasoning with perceptions is reduced to computing and reasoning with words.
To be able to compute with perceptions it is necessary to have a means of representing their meaning in a way that lends itself to computation. Conventional approaches to meaning representation cannot serve this purpose because the intrinsic imprecision of perceptions puts them well beyond the expressive power of predicate logic and related systems.
In the computational theory of perceptions, representation of meaning is a preliminary to reasoning with perceptions--a process that starts with a collection of perceptions that constitute the initial data set (IDS) and terminates in a proposition or a collection of propositions that play the role of an answer to a query, that is, the terminal data set (TDS).
The principal aim of the computational theory of perceptions is the development of an automated capability to reason with perception-based information. Existing theories do not have this capability and rely instead on conversion of perceptions into measurements--a process that in many cases is infeasible, unrealistic, or counterproductive. In this perspective, addition of the machinery of the computational theory of perceptions to existing theories may eventually lead to theories that have a superior capability to deal with real-world problems and make it possible to conceive and design systems with a much higher MIQ (Machine IQ) than those we have today.
1Staff
In this project, we will develop and deploy an intelligent computer system is called Tool for Intelligent Knowledge Management and Discovery (TIKManD). The system can mine Internet homepages, emails, chat lines, and/or authorized wire tapping information (which may include multi-lingual information) to recognize, conceptually match, and rank potential terrorist activities (both common and unusual) by the type and seriousness of the activities. This will be done automatically or semi-automatically based on predefined linguistic formulations and rules defined by experts or based on a set of known terrorist activities given the information provided through law enforcement databases (text and voices) and huge number of "tips" received immediately after the attack.
To use the wire tapped and/or chat information, we will use state-of-the-art technology to change voice to text. Given the fact that the information to be gathered will include both Arabic and Farsi information, we will use state-of-the-art Arabic-English and Farsi-English dictionaries and thesauruses. We intent to develop multi-lingual terrorism ontology and a dictionary that will be developed both through existing documents related to terrorism activities and defined by experts and will be linked to English ontology that will be built for terrorism recognition. An intelligent technique based on integrated Latent Semantic Indexing and Self Organization Map (LSI-SOM), developed at LBNL and UC Berkeley, will be used for this aspect of the project.
In addition, a Conceptual Fuzzy Set (CFS) model will be used for intelligent information and knowledge retrieval through conceptual matching of text and voice (here defined as "concept"). The CFS can be used for constructing fuzzy ontology or terms relating to the context of the investigation (terrorism) to resolve the ambiguity. This model can be used to calculate conceptually the degree of match to the object or query. In addition, the ranking can be used for intelligently allocating resources given the degree of match between objectives and resources available.
Finally, we will use our state-of-the-art technology as a tool for effective data fusion and data mining of textual and voice information. Data fusion would attempt to collect seemingly related data to recognize: (1) increased activities in a given geographic area, (2) unusual movements and travel patterns, (3) intelligence gathered through authorized wire tapping of suspected subgroups, (4) unexplainable increase or decrease in routine communication, and (5) integration of reports and voice (text/audio) from different sources.
Data mining would use the above data to pinpoint and extract useful information from the rest. This would also apply to the "response" part of the initiative on how to fuse the huge number of "tips" received immediately after the attack, filter out the irrelevant parts, and act on the useful subset of such data in a speedy manner.
For example, given data about movement of people of interest, we can use the proposed data analysis technology to identify suspicious patterns of behavior, raise alarms and do interactive data analysis on geographical maps of the world or specific countries. For email and telephone monitoring, standard search technology would not be suitable, as they are often short and non-standard, so the system will need significant prior knowledge about what to look for. Our recent work on fuzzy query, search engines, and ranking can be leveraged here.
Finally, in phase two we intend to design a secure and distributed IT infrastructure to provide a means for secure communication between resources and to build textual, voice, and image databases online. Given the distributed nature of the information sources, a federated database framework is required to distribute storage and information access across multiple locations.
Tracks the Terrorist (TrasT): phase one of this project includes: (1) text data mining, (2) ontology related to terrorism activities, (3) a tracking system, and (4) a federated data warehouse.
Text data mining and ontology: we intend to develop ontology and a dictionary that will be developed both through existing documents related to terrorism activities and defined by experts and will be linked to English thesauruses that will be used for terrorism recognition. An Intelligent technique based on integrated Latent Semantic Indexing and Self Organization Map (LSI-SOM) will be used in this project.
Tracking system: given the information about suspicious activities such as phone calls, emails, meetings, credit card information, hotel, and airline reservations that are stored in a database containing the originator, recipient, locations, times, etc. We can use visual data mining software developed at Btexact Technologies (UC Berkeley sponsor) to find suspicious patterns in data using geographical maps. The technology developed can detect unusual patterns, raise alarms based on classification of activities, and offer explanations based on automatic learning techniques for why a certain activity is placed in a particular class such as "safe," "suspicious," "dangerous," etc. The underlying techniques can combine expert knowledge and data driven rules to continually improve its classification and adapt to dynamic changes in data and expert knowledge.
The system is Java-based, web-enabled, and uses MapInfo (commercial mapping software) and Oracle database to display and access the data. It consists of two main subsystems: modeling and visualization. The modeling subsystem deals with data analysis and explanation generation, and visualization represents extracted patterns on geographical maps and enables visual interaction with the modeling back end. For example, US authorities have announced that 5 out of 19 suspected September 11 hijackers met in Las Vegas several days before the attack. Some of the terrorists attended the same training school, had communication tracks, and flew at the same time period to the same location in the same flight. In addition, some of the suspects were on the FBI watch list. The new model intends to capture and recognize such activities and raise alarms by profiling, e.g., spatial clustering.
Data warehouse architecture: we will leverage DOE security (funded through National Collaboratory) to provide a means for secure communication between resources and to build textual databases. Given the nature of the information sources, data warehouse architecture for information access across multiple locations is required.
1Staff
Born at the beginning of the second half of the last century, decision analysis, or DA for short, was the brainchild of von Neumann, Morgenstern, Wald, and other great intellects of that period. Decision analysis was, and remains, driven by a quest for a theory that is prescriptive, rigorous, quantitative, and precise. The question is: can this aim be achieved? A contention that is advanced in the following is that the aim is unachievable by a prolongation of existing theories. What is needed in decision analysis is a significant shift in direction--a shift from computing with numbers to computing with words and from manipulation of measurements to manipulation of perceptions.
Decisions are based on information. More often than not, the decision-relevant information is a mixture of measurements and perceptions--perceptions exemplified by: it is very unlikely that there will be a significant decrease in the price of oil in the near future. The problem with perceptions is that they are intrinsically imprecise, reflecting the bounded ability of the human mind to resolve detail and store information. More specifically, perceptions are f-granular in the sense that (1) the boundaries of perceived classes are unsharp; and (2) the values of perceived attributes are granular, with a granule being a clump of values drawn together by indistinguishability, similarity, proximity, or functionality. For example, a perception of likelihood may be described as "very unlikely," and a perception of gain as "not very high."
Existing decision theories have a fundamental limitation--they lack the capability to operate on perception-based information. To add this capability to an existing theory, T, three stages of generalization are required: (1) f-generalization, which adds to T the capability to operate on fuzzy sets, leading to a generalized theory denoted as T+; (2) f.g-generalization, which leads to a theory denoted as T++, and adds to T+ the capability to compute with f-granular variables, e.g., a random variable that takes the values small, medium, and large with respective probabilities low, high, and low; and (3) nl-generalization, which leads to a theory denoted as Tp, and adds to T++ the capability to operate on perception-based information expressed in a natural language.
A concept that plays a key role in perception-based decision analysis is that of precisiated natural language, PNL. Basically, PNL is a subset of a natural language, NL, which consists of propositions that can be precisiated through translation into a generalized constraint language. A generalized constraint is expressed as X isr R, where X is the constrained variable; R is the constraining relation; and r is a discrete-valued indexing variable whose value defines the way in which R constrains X. The principal types of constrains are: possibilistic (r = blank); veristic (r = v); probabilistic (r = p); random set (r = rs); fuzzy graph (r = fg); usuality (r = u); and Pawlak set (r = ps).
More general constraints may be generated by combination, modification, and qualification. The collection of all such constraints is the Generalized Constraint Language, GCL. By construction, GCL is maximally expressive. As a consequence, PNL is the largest subset of NL which is precisiable through translation into GCL. In general, X, R, and r are implicit rather than explicit. Thus, if p is a proposition in a natural language, then its translation into GCL involves explicitation of X, R and r in its translate, X isr R.
In PDA, precisiated natural language is employed to define the goals, constraints, relations, and decision-relevant information. An important sublanguage of PNL is the language of fuzzy if-then rules, FRL. In this language, a perception of a function, Y = f(X), is described by a collection of fuzzy if-then rules, e.g., if X is small then Y is small; if X is medium then Y is large; if X is large then Y is small. Such a collection is referred to as the fuzzy graph of f, f*.
For example, if X is a random variable, then a perception of its probability distribution may be described as: if X is small then its probability is low; if X is medium then its probability is very low; and if X is large then its probability is high. More generally, employment of PNL in decision analysis adds an important capability--a capability to operate on perception-based information. This capability has the effect of substantially enhancing the ability of DA to deal with real-world problems. More fundamentally, the high expressive power of PNL opens the door to a redefinition of such basic concepts as optimality and causality. In particular, what is called into question is the validity of conventional approaches in which optimization is equated to a maximization of a scalar objective function. It is argued that to achieve a close rapport with reality, the use of PNL in optimization is a matter of necessity rather than choice.
Perception-based decision analysis represents a significant change in direction in the evolution of decision analysis. As we move farther into the age of machine intelligence and automation of reasoning, the need for a shift from computing with numbers to computing with words, and from manipulation of measurements to manipulation of perceptions, will cease to be a matter of debate. A case in point is the following example--referred to as the Robert example. These are three progressively difficult versions of the example. The simplest version is:
Suppose that I need to call Robert by phone at home. What I know is that usually Robert returns from work at about 6:00 p.m. My questions are: (1) What is the probability that Robert is home at 6:30 p.m.? (2) What is the earliest time at which the probability that Robert is home is high? and (3) At 6:30 p.m., should I call Robert person-to-person or station-to-station, assuming that the costs are, respectively, 1 and 2.
* Professor in the Graduate School and Director, Berkeley Initiative in Soft Computing (BISC), Computer Science Division and the Electronics Research Laboratory, Department of EECS, Univeristy of California, Berkeley, CA 94720-1776; Telephone: 510-642-4959; Fax: 510-642-1712; E-Mail: zadeh@cs.berkeley.edu. Research supported in part by ONR Contract N00014-99-C-0298, NASA Contract NCC2-1006, NASA Grant NAC2-117, ONR Grant N00014-96-1-0556, ONR Grant FDN0014991035, ARO Grant DAAH 04-961-0341, and the BISC Program of UC Berkeley.
It is a deep-seated tradition in science to view the use of natural languages in scientific theories as a manifestation of mathematical immaturity. The rationale for this tradition is that natural languages are lacking in precision. However, what is not recognized to the extent that it should, is that adherence to this tradition carries a steep price. In particular, a direct consequence is that existing scientific theories do not have the capability to operate on perception-based information exemplified by "Most Finns are honest." Such information is usually described in a natural language and is intrinsically imprecise, reflecting a fundamental limitation on the cognitive ability of humans to resolve detail and store information. Because of their imprecision, perceptions do not lend themselves to meaning-representation through the use of precise methods based on predicate logic. This is the principal reason why existing scientific theories do not have the capability to operate on perception-based information.
In a related way, the restricted expressive power of predicate-logic-based languages rules out the possibility of defining many basic concepts such as causality, resemblance, smoothness, and relevance in realistic terms. In this instance, as in many others, the price of precision is over-idealization and lack of robustness.
In a significant departure from existing methods, in the approach described in this work, the high expressive power of natural languages is harnessed by constructing what is called a precisiated natural language (PNL).
In essence, PNL is a subset of a natural language (NL)--a subset that is equipped with constraint-centered semantics (CSNL) and is translatable into what is called the Generalized Constraint Language (GCL). A concept that has a position of centrality in GCL is that of a generalized constraint expressed as X isr R, where X is the constrained variable, R is the constraining relation, and isr (pronounced as ezar) is a variable copula in which r is a discrete-valued variable whose value defines the way in which R constrains X. Among the principal types of constraints are possibilistic, veristic, probabilistic, random-set, usuality, and fuzzy-graph constraints.
With these constraints serving as basic building blocks, more complex (composite) constraints may be constructed through the use of a grammar. The collection of composite constraints forms the Generalized Constraint Language (GCL). The semantics of GCL is defined by the rules that govern combination and propagation of generalized constraints. These rules coincide with the rules of inference in fuzzy logic (FL).
A key idea in PNL is that the meaning of a proposition, p, in PNL may be represented as a generalized constraint that is an element of GCL. Thus, translation of p into GCL is viewed as an explicitation of X, R and r. In this sense, translation is equivalent to explicitation.
The concept of a precisiated natural language, the associated methodologies of computing with words, and the computational theory of perceptions open the door to a wide-ranging generalization and restructuring of existing theories, especially in the realms of information processing, decision, and control. In this perspective, what is very likely is that in coming years a number of basic concepts and techniques drawn from linguistics will be playing a much more important role in scientific theories than they do today.
* Professor in the Graduate School and Director, Berkeley Initiative in Soft Computing (BISC), Computer Science Division and the Electronics Research Laboratory, Department of EECS, Univeristy of California, Berkeley, CA 94720-1776; Telephone: 510-642-4959; Fax: 510-642-1712; E-Mail: zadeh@cs.berkeley.edu. Research supported in part by ONR Contract N00014-99-C-0298, NASA Contract NCC2-1006, NASA Grant NAC2-117, ONR Grant N00014-96-1-0556, ONR Grant FDN0014991035, ARO Grant DAAH 04-961-0341, and the BISC Program of UC Berkeley.
Current-generation automatic speech recognition (ASR) systems assume that words are readily decomposable into constituent phonetic components ("phonemes"). A detailed linguistic dissection of state-of-the-art speech recognition systems indicates that the conventional phonemic "beads-on-a-string" approach is of limited utility, particularly with respect to informal, conversational material. The study shows that there is a significant gap between the observed data and the pronunciation models of current ASR systems. It also shows that many important factors affecting recognition performance are not modeled explicitly in these systems.
Our research analyzes spontaneous speech with respect to three important, but often neglected, components of speech (at least with respect to English ASR). These components are articulatory-acoustic features (AFs), the syllable and stress accent. Analysis results provide evidence for an alternative approach of speech modeling, one in which the syllable assumes preeminent status and is melded to the lower as well as the higher tiers of linguistic representation through the incorporation of prosodic information such as stress accent. Using concrete examples and statistics from spontaneous speech material it is shown that there exists a systematic relationship between the realization of AFs and stress accent in conjunction with syllable position. This relationship can be used to provide an accurate and parsimonious characterization of pronunciation variation in spontaneous speech. An approach to automatically extract AFs from the acoustic signal is also developed, as is a system for the automatic stress-accent labeling of spontaneous speech.
Based on the results of these studies a syllable-centric, multi-tier model of speech recognition is proposed. The model explicitly relates AFs, phonetic segments and syllable constituents to a framework for lexical representation, and incorporates stress-accent information into recognition. A test-bed implementation of the model is developed using a fuzzy-based approach for combining evidence from various AF sources and a pronunciation-variation modeling technique using AF-variation statistics extracted from data. Experiments on a limited-vocabulary speech recognition task using both automatically derived and fabricated data demonstrate the advantage of incorporating AF and stress-accent modeling within the syllable-centric, multi-tier framework, particularly with respect to pronunciation variation in spontaneous speech. The multi-tier model and the preliminary implementation are also being extended to provide speech-based interface to specific applications in areas such as telephone directory assistance and internet information retrieval.
The Robert Example is named after my colleague and good friend, Robert Wilensky. The example is intended to serve as a test of the ability of standard probability theory (PT) to deal with perception-based information, e.g., "Usually Robert returns from work at about 6pm." An unorthodox view that is articulated in the following is that to add to PT the capability to process perception-based information it is necessary to generalize PT in three stages. The first stage, f-generalization, adds to PT the capability to deal with fuzzy probabilities and fuzzy events--a capability which PT lacks. The result of generalization is denoted as PT+.
The second stage, g-generalization, adds to PT+ the capability to operate on granulated (linguistic) variables and relations. Granulation plays a key role in exploiting the tolerance for imprecision for achieving robustness, tractability and data compression. G-generalization of PT+ or, equivalently, f.g-generalization of PT, is denoted as PT++.
The third stage, nl-generalization, adds to PT++ the capability to operate on information expressed in a natural language, e.g., "It is very unlikely that there will be a significant increase in the price of oil in the near future." Such information will be referred to as perception-based, and, correspondingly, nl-generalization of PT, PTp, will be referred to as perception-based probability theory. PTp subsumes PT as a special case.
The Robert Example is a relatively simple instance of problems which call for the use of PTp. Following is its description.
I want to call Robert in the evening, at a time when he is likely to be home. The question is: At what time, t, should I call Robert? The decision-relevant information is the probability, P(t), that Robert is home at about t pm.
There are three versions, in order of increasing complexity, of perception-based information which I can use to estimate P(t).
Version l. Usually Robert returns from work at about 6 pm.
Version 2. Usually Robert leaves his office at about 5:30 pm, and usually it takes about 30 minutes to get home.
Version 3. Usually Robert leaves office at about 5:30 pm. Because of traffic, travel time depends on when he leaves. Specifically, if Robert leaves at about 5:20 or earlier, then travel time is usually about 25 min.; if Robert leaves at about 5:30 pm, then travel time is usually about 30 min; if Robert leaves at 5:40 pm or later, travel time is usually about 35 min.
The problem is to compute P(t) based on this information. Using PTp, the result of computation would be a fuzzy number which represents P(t). A related problem is: What is the earliest time for which P(t) is high?
Solution of Version l using PTp is described in the paper "Toward a Perception-Based Theory of Probabilistic Reasoning with Imprecise Probabilities," which is scheduled to appear in a forthcoming issue of the Journal of Statistical Planning and Inference.
It is of interest to note that solution of a crisp version of Version l leads to counterintuitive results. Specifically, assume that with probability 0.9 Robert returns from work at 6 pm plus/minus l5 min; and that P(t) is the probability that Robert is home at exactly t pm. Then it is easy to verify that P(t)>0.9 for t>6:l5; P(t) is between 0 and l for 5:45
* Professor in the Graduate School and director, Berkeley initiative in Soft Computing (BISC), Computer Science Division and the Electronics Research Laboratory, Department of EECS, University of California, Berkeley, CA 94720-1776; Telephone: 510-642-4959; Fax: 510-642-1712;E-Mail: zadeh@cs.berkeley.edu. Research supported in part by ONR N00014-00-1-0621, ONR N00014-00-1-0621, ONR Contract N00014-99-C-0298, NASA Contract NCC2-1006, NASA Grant NAC2-117, ONR Grant N00014-96-1-0556, ONR Grant FDN0014991035, ARO Grant DAAH 04-961-0341 and the BISC Program of UC Berkeley.
Perceptions play a central role in human cognition. There are close links between perceptions, consciousness and our ability to reason and make decisions on both conscious and subconscious levels. There is an enormous literature dealing with perceptions from every conceivable point of view. And yet, what is not in existence is a theory which makes it possible to compute with perceptions. A framework for such a theory—the computational theory of perceptions (CTP)—is described.
In CTP, the point of departure is the human ability to describe perceptions in a natural language. Thus, in CTP a perception is equated to its description, e.g., “It was very warm during much of the early part of summer.”
Perceptions are intrinsically imprecise, reflecting the bounded ability of sensory organs and ultimately the brain, to resolve detail and store information. More specifically, perceptions are f-granular in the sense that (a) the boundaries of perceived classes are imprecise; and (b) the values of attributes are granular, with a granule being a clump of values drawn together by indistinguishability, similarity, proximity or functionality. F-granularity of perceptions puts them well beyond the reach of existing meaning-representation systems based on bivalent logic.
In CTP, the meaning of a perception, p, is represented through the use of what is referred to as precisiated natural language (PNL). In PNL, the meaning of p is expressed as a generalized constraint on a variable, X, which, in general, is implicit on p. A collection of rules governing generalized constraint propagation is employed for reasoning and computing with perceptions.
CTP adds to existing theories the capability to operate on perception-based information exemplified by “It is very unlikely that there will be a significant increase in the price of oil in the near future.” This capability provides a basis for extending the ability of existing theories—and especially probability theory—to deal with real-world problems.
* Professor in the Graduate School and Director, Berkeley initiative in Soft Computing (BISC), Computer Science Division and the Electronics Resear ch Laboratory, Department of EECS, University of California, Berkeley, CA 94720- 1776; Telephone: 510-642-4959; Fax: 510-642-1712; E-Mail: zadeh@cs.berkeley.edu.
The point of departure in perception-based probability theory, PTp, is the postulate that subjective probabilities are not numerical—as they are assumed to be in the theory of subjective probabilities. Rather, subjective probabilities are perceptions of likelihood, similar to perceptions of length, distance, speed, age, taste resemblance, etc.
Reflecting the bounded ability of human sensory organs and, ultimately, the brain, to resolve detail and store information, perceptions are intrinsically imprecise. More specifically, perceptions are f-granular in the sense that (a) the mental boundaries of perceived classes are fuzzy, and (b) the perceived values of attributes are granular, with a granule being a fuzzy clump of values drawn together by indistinguishability, similarity, proximity, or functionality. For example, perceptions of age may be described as young, middle-aged, old, very old, etc.
An immediate consequence of the postulate that subjective probabilities are perceptions of likelihood, is the postulate that subjective probabilities are f-granular. Thus, the perceived values of the subjective probability of an event may be described as low, medium, high, very high, etc.
A pivotal concept in PTp is that of a f-granular probability distribution. More specifically, if X is a real-valued random variable with granules labeled small, medium and large, then its f-granular probability distribution may be expressed symbolically as P=low\small+high\medium+low\large, with the understanding that low, for example, is a perception of the probability that X is small.
In the main, PTp is a system for computing, reasoning, and decision-making with f-granular probability distributions. Conceptually and computationally, PTp is significantly more complex than standard, measurement-based probability theory, PT. The importance of PTp derives from the fact that, as a generalization of PT, it opens the door to addressing problems and issues which are beyond the reach of PT.
* Professor in the Graduate School and Director, Berkeley initiative in Soft Computing (BISC), Computer Science Division and the Electronics Research Laboratory, Department of EECS, University of California, Berkeley, CA 94720-1776; Telephone: 510-642-4959; Fax: 510-642-1712; E-Mail: zadeh@cs.berkeley.edu.
Attempts to formulate mathematically precise definitions of basic concepts such as causality, randomness, and probability, have a long history. The concept of hierarchical definability that is outlined in this project suggests that such definitions may not exist. Furthermore, it suggests that existing definitions of many basic concepts, among them those of linearity stability, statistical independence, and Pareto-optimality, may be in need of reformulation.
In essence, definability is concerned with whether and how a concept, X, can be defined in a way that lends itself to mathematical analysis and computation. In mathematics, definability of mathematical concepts is taken for granted. But as we move farther into the age of machine intelligence and automated reasoning, the issue of definability is certain to grow in importance and visibility, raising basic questions that are not easy to resolve.
To be more specific, let X be the concept of, say, a summary, and assume that I am instructing a machine to generate a summary of a given article or a book. To execute my instruction, the machine must be provided with a definition of what is meant by a summary. It is somewhat paradoxical that we have summarization programs that can summarize, albeit in a narrowly prescribed sense, without being able to formulate a general definition of summarization. The same applies to the concepts of causality, randomness, and probability. Indeed, it may be argued that these and many other basic concepts cannot be defined within the conceptual framework of classical logic and set theory.
The point of departure in our approach to definability is the assumption that definability has a hierarchical structure. Furthermore, it is understood that a definition must be unambiguous, precise, operational, general, and co-extensive with the concept it defines.
In THD, a definition of X is represented as a quadruple (X, L, C, D), where L is the language in which X is defined; C is the context; and D is the definition. The context, C, delimits the class of instances to which D applies, that is, C defines the domain of D. For example, if X is the concept of volume, D may not apply to a croissant or a tree.
The language, L, is assumed to be a member of hierarchy represented as (NL, BL, F, F.G, PNL). In this hierarchy, NL is a natural language—a language which is in predominant use as a definition language in social sciences and law; and BL is the binary-logic-based mathematical language used in all existing scientific theories. Informally, a concept, X, is BL-definable if it is a crisp concept, e.g., a prime number, a linear system or a Gaussian distribution. F is a mathematical language based on fuzzy logic and F.G is an extension of F in which variables and relations are allowed to have granular structure. The last member of the hierarchy is PNL—Precisiated Natural Language. PNL is a maximally-expressive language which subsumes BL, F and F.G. In particular, the language of fuzzy if-then rules is a sublanguage of PNL.
X is a fuzzy concept if its denotation, D(X), is a fuzzy set in its universe of discourse. A fuzzy concept is associated with a membership function that assigns to each point, u, in the universe of discourse of X, the degree to which u is a member of D(X). Alternatively, a fuzzy concept may be defined algorithmically in terms of other fuzzy concepts. Examples of fuzzy concepts are: small number, strong evidence and similarity. It should be noted that many concepts associated with fuzzy sets are crisp concepts. An example is the concept of a convex fuzzy set. Most fuzzy concepts are context-dependent.
The next level in the hierarchy is that of F.G-definability, with G standing for granular, and F.G denoting the conjunction of fuzzy and granular. Informally, in the case of a concept which is F.G-granular, the values of attributes are granulated, with a granule being a clump of values that are drawn together by indistinguishability, similarity, proximity, or functionality. F.G-granularity reflects the bounded ability of the human mind to resolve detail and store information. An example of an F.G-granular concept that is traditionally defined as a crisp concept, is that of statistical independence. This is a case of misdefinition--a definition that is applied to instances for which the concept is not defined, e.g., fuzzy events. In particular, a common misdefinition is to treat a concept as if it were BL-definable, whereas in fact it is not. In THD, the concept of context serves to differentiate between "defined" and "definability."
The next level is that of PNL-definability. Basically, PNL consists of propositions drawn from a natural language that can be precisiated through translation into what is called precisiation language. An example of a proposition in PNL is: It is very unlikely that there will be a significant increase in the price of oil in the near future.
In the case of PNL, the precisiation language is the Generalized Constraint Language (GCL). A generic generalized constraint is represented as Z isr R, where Z is the constrained variable, R is the constraining relation and r is a discrete-valued indexing variable whose values define the ways in which R constrains Z. The principal types of constraints are: possibilistic (r = blank); veristic (r = v); probabilistic (r = p); random set (r = rs); usuality (r = u); fuzzy graph (r = fg); and Pawlak set (r = ps). The rationale for constructing a large variety of constraints is that conventional crisp constraints are incapable of representing the meaning of propositions expressed in a natural language--most of which are intrinsically imprecise--in a form that lends itself to computation.
The elements of GCL are composite generalized constraints that are formed from generic generalized constraints by combination, modification and qualification. An example of a generalized constraint in GCL is ((Z isp R) and (Z, Y) is S) is unlikely.
By construction, the Generalized Constraint Language is maximally expressive. What this implies is that PNL is the largest subset of a natural language that admits precisiation. Informally, this implication serves as a basis for the conclusion that if a concept, X, cannot be defined in terms of PNL, then, in effect, it is undefinable or, synonymously, amorphic.
In this perspective, the highest level of definability hierarchy, which is the level above PNL-definability, is that of undefinability or amorphicity. A canonical example of an amorphic concept is that of causality. More specifically, is it not possible to construct a general definition of causality such that given any two events A and B and the question, "Did A cause B?", the question could be answered based on the definition. Equivalently, given any definition of causality, it will always be possible to construct examples to which the definition would not apply or yield counterintuitive results. In general, definitions of causality are non-operational because of infeasibility of conducting controlled experiments.
In dealing with an amorphic concept, X, what is possible--and what we generally do--is to restrict the domain of applicability of X to instances for which X is definable. For example, in the case of the concept of a summary, which is an amorphic concept, we could restrict the length, type, and other attributes of what we want to summarize. In this sense, an amorphic concept may be partially definable or, p-definable, for short. The concept of p-definability applies to all levels of the definability hierarchy.
The use of PNL as a definition language opens the door to definition of concepts of the form "approximate X," e.g., approximate linearity, approximate stability and, most importantly, approximate theorem. What we do not want to do is to use a BL-based language to define "approximate X," since such BL-based definitions may lead to counterintuitive conclusions.
The theory of hierarchical definability is not a theory in the traditional spirit. The definitions are informal and conclusions are not theorems. Nonetheless, it serves a significant purpose by raising significant questions about a basic issue--the issue of definability of concepts that lie at the center of scientific theories.
* Professor in the Graduate School and Director, Berkeley Initiative in Soft Computing (BISC), Computer Science Division and the Electronics Research Laboratory, Department of EECS, Univeristy of California, Berkeley, CA 94720-1776; Telephone: 510-642-4959; Fax: 510-642-1712; E-Mail: zadeh@cs.berkeley.edu. Research supported in part by ONR Contract N00014-99-C-0298, NASA Contract NCC2-1006, NASA Grant NAC2-117, ONR Grant N00014-96-1-0556, ONR Grant FDN0014991035, ARO Grant DAAH 04-961-0341, and the BISC Program of UC Berkeley.
Fuzzy logic has been—and to some extent still is—an object of controversy. Some are turned-off by its name. But, more importantly, fuzzy logic is tolerant of imprecision and partial truth. It is this tolerance that is in conflict with the deep-seated Cartesian tradition of aiming at truth which is bivalent, with no shades of gray allowed.
There are many misconceptions about fuzzy logic. In large measure, the misconceptions reflect the fact that the term “fuzzy logic” has two distinct interpretations. More specifically, in a narrow sense, fuzzy logic is the logic of approximate reasoning. But in a wider sense—which is in dominant use today—fuzzy logic, denoted as FL, is coextensive with the theory of fuzzy sets, and contains fuzzy logic in its narrow sense as one of its branches. In fact, most applications of FL involve modes of analysis which are computational rather than logical in nature.
Fuzzy logic, FL, has four principal facets. First, the logical facet, FL , which is fuzzy logic in its narrow sense. Second, the set-theoretic facet, FLs, which is concerned with classes having unsharp boundaries, that is, with fuzzy sets. Third, the relational facet, FLr, which is concerned with linguistic variables, fuzzy if-then rules and fuzzy relations. It is this facet that underlies almost all applications of fuzzy logic in control, decision analysis, industrial systems, and consumer products. And fourth, the epistemic facet, FLe, which is concerned with knowledge, meaning, and linguistics. One of the important branches of FLe is possibility theory.
A concept which has a position of centrality in FL is that of fuzzy granularity or, simply, f-granularity. F-granularity reflects the bounded ability of human sensory organs and, ultimately, the brain, to resolve detail and store information. In particular, human perceptions are, for the most part, f-granular in the sense that (a) the boundaries of perceived classes are fuzzy, and (b) the perceived attributes are granulated, with a granule being a clump of values drawn together by indistinguishability, similarity, proximity or functionality. In this perspective, the colors red, blue, green, etc., may be viewed as labels of granules of perception of color.
Precision carries a cost. This is the main reason why in most of its applications, the machinery of fuzzy logic is employed to exploit the tolerance for imprecision for achieving tractability, robustness and low solution cost. In fact, it is the tolerance for imprecision that underlies the remarkable human capability to perform a wide variety of physical and mental tasks, e.g., drive in city traffic, based solely on perceptions, without any measurements and any computations. It is this capability that motivated the development of fuzzy-logic-based computational theory of perceptions (CTP). Existing theories and, in particular, probability theory, do not have the capability to operate on perception-based information.
The computational theory of perceptions is a branch of the fuzzy-logic-based methodology of computing with words (CW). Development of the methodology of computing with words is an important event in the evolution of fuzzy logic. Eventually, it may lead to a radical enlargement of the role of natural languages in information processing, decision, and control.
* Professor in the Graduate School and Director, Berkeley initiative in Soft Computing (BISC), Computer Science Division and the Electronics Resear ch Laboratory, Department of EECS, University of California, Berkeley, CA 94720- 1776; Telephone: 510-642-4959; Fax: 510-642-1712; E-Mail: zadeh@cs.berkeley.edu.
In medical imaging, the typical maximum intensity projection (MIP) method projects the maximum intensity values in a three-dimensional (3D) data set to a two-dimensional (2D) plane. However, since several artifacts are involved in the MIP images, it keeps us from examining the exact shape and state of the region of interest (ROI). We propose a new three-dimensional rendering method called fuzzy maximum intensity projection (FMIP), which projects the higher fuzzy membership degrees of target ROI as the brighter values to a 2D plane. Since the FMIP projects the membership degree of every voxel, the FIMP can provide the exact information of the ROI. This membership degree can be given in a fuzzification algorithm. This appropriate fuzzification enables us to examine the ROI with high accuracy. We applied the FMIP to knee CT and lumbar MR images using our developed fuzzification algorithms. The FMIP of the CT meniscus is useful for diagnosing the meniscal tears. The FMIP of the MR endorrhachis is useful for diagnosing a hernia. Consequently, the comparison between FMIP and MIP showed that FMIP was able to effectively visualize the ROI.
Toward a Computational Theory of Perceptions
(Professor Lotfi A. Zadeh)
Toward a Perception-based Theory of Probabilistic Reasoning with F-Granular Probability Distributions
(Professor Lotfi A. Zadeh)
Toward a Theory of Hierarchical Definability (THD)--Causality is Undefinable
(Professor Lotfi A. Zadeh)
(ARO) DAAH 04-961-0341, BISC Program of UC Berkeley, (NASA) NAC2-117, (NASA) NCC2-1006, (ONR) N00014-99-C-0298, (ONR) N00014-96-1-0556, and (ONR) FDN0014991035
What Is Fuzzy Logic and What Are Its Applications?
(Professor Lotfi A. Zadeh)
Fuzzy Maximum Intensity Projection (FMIP) in Medical Image Diagnosis
Yutaka Hata1
(Professor Lotfi A. Zadeh)
Berkeley Initiative in Soft Computing
1Visiting Professor, Himeji Institute of Technology