(Professors Laurent El Ghaoui and Michael I. Jordan)

(NSF) IIS-9988642 and (ONR) MURI N00014-00-1-0637

Kernel-based learning algorithms work by embedding the data into a Euclidean space, and then searching for linear relations among the embedded data points. The embedding is performed implicitly, by specifying the inner products between each pair of points in the embedding space. This information is contained in the so-called kernel matrix, a symmetric and positive definite matrix that encodes the relative positions of all points. Specifying this matrix amounts to specifying the geometry of the embedding space and inducing a notion of similarity in the input space--classical model selection problems in machine learning. In this project we show how the kernel matrix can be learned from data via semi-definite programming (SDP) techniques. When applied to a kernel matrix associated with both training and test data this gives a powerful transductive algorithm. Using the labeled part of the data one can learn an embedding also for the unlabeled part. The similarity between test points is inferred from training points and their labels. Importantly, these learning problems are convex, so we obtain a method for learning both the model class and the function without local minima. Finally, the novel approach presented in the project is supported by positive empirical results.

^{1}Professor, Dept. of Statistics, UC Davis

^{2}Visiting Professor, A.N.U., Canberra

(Professors Laurent El Ghaoui and Michael I. Jordan)

(NSF) IIS-9988642 and (ONR) MURI N00014-00-1-0637

In this project we consider the problem of novelty detection, presenting an algorithm that aims to find a minimal region in input space containing a fraction alpha of the probability mass underlying a data set. This algorithm, the "single-class minimax probability machine (MPM)," is built on a distribution-free methodology that minimizes the worst-case probability of a data point falling outside of a convex set, given only the mean and covariance matrix of the distribution and making no further distributional assumptions. We present a robust approach to estimating the mean and covariance matrix within the general two-class MPM setting, and show how this approach specializes to the single-class problem. We provide empirical results comparing the single-class MPM to the single-class SVM and a two-class SVM method.

(Professor Michael I. Jordan)

Microsoft Fellowship, (NSF) IIS-9988642, and (ONR) N00014-01-1-0890

We consider the problem of modeling annotated data--data with multiple types where the instance of one type (such as a caption) serves as a description of the other type (such as an image). We describe three hierarchical mixture models that are aimed at such data, culminating in the Corr-LDA model, a latent variable model that is effective at both joint clustering and automatic annotation. We conduct experiments to test these models using the Corel database of images and captions.

(Professor Michael I. Jordan)

Intel Corporation, (NSF) IIS-9988642, and (ONR/MURI) N00014-00-1-0637

Independent component analysis is a recent statistical method for revealing hidden factors of sets of random variables, where the hidden factors (the components) are assumed to be statistically independent. It has been successfully applied to problems such as audio blind source separation--the so-called "cocktail party" problem, where streams of overlapping speech have to be separated according to the individual speakers--and biomedical data processing.

Kernel methods, as exemplified by support vector machines, are a novel paradigm for pattern recognition. They efficiently enable one to extend well-known and well-studied linear techniques to become nonlinear techniques.

The object of our work is to use kernel methods to perform ICA. We show how, by applying the classical multivariate statistical technique of canonical correlations to a kernel space, we obtain a family of nonlinear ICA algorithms. On synthetic examples, our "Kernel-ICA" algorithms outperform many of the known algorithms for ICA.

- [1]
- F. R. Bach and M. I. Jordan, "Kernel Independent Component Analysis,"
*J. Machine Learning Research,*Vol. 3, 2002. - [2]
- A. Hyvarinen, J. Karhunen, and E. Oja,
*Independent Component Analysis,*John Wiley and Sons, 2001. - [3]
- B. Scholkopf and A. J. Smola,
*Learning with Kernels,*MIT Press, 2001.

(Professor Michael I. Jordan)

Intel Corporation, (NSF) IIS-9988642, and (ONR/MURI) N00014-00-1-0637

We present a class of algorithms for learning the structure of graphical models from data. The algorithms are based on a measure known as the kernel generalized variance (KGV), which essentially allows us to treat all variables on equal footing as Gaussians in a feature space obtained from Mercer kernels. Thus we are able to learn hybrid graphs involving discrete and continuous variables of arbitrary type. We explore the computational properties of our approach, showing how to use the kernel trick to compute the relevant statistics in linear time. We illustrate our framework with experiments involving discrete and continuous data.

- [1]
- F. R. Bach and M. I. Jordan, "Learning Graphical Models with Mercer Kernels,"
*NIPS*(to appear).

(Professor Michael I. Jordan)

Intel Corporation, (NSF) IIS-9988642, and (ONR/MURI) N00014-00-1-0637

We propose a generalization of independent component analysis (ICA), where instead of looking for a linear transform that makes the data components independent, we look for a transform that makes the data components well fit by a tree-structured graphical model. Treating the problem as a semiparametric statistical problem, we show that the optimal transform is found by minimizing a contrast function based on mutual information, a function that directly extends the contrast function used for classical ICA. We provide two approximations of this contrast function, one using kernel density estimation, and another using kernel generalized variance.

This tree-dependent component analysis framework leads naturally to an efficient general multivariate density estimation technique where only bivariate density estimation needs to be performed.

- [1]
- F. R. Bach and M. I. Jordan, "Tree-dependent Component Analysis,"
*Uncertainty in Artificial Intelligence Conf. Proc.,*Edmonton, Canada, August 2002.

(Professor Michael I. Jordan)

(ONR) N00014-00-1-0758

In recent years, high throughput biological techniques have been developed to collect data on the genomic response of an organism to stress. We employ DNA microarray technology to gain a survival snapshot of haploinsufficient Saccharomyces cerevisiae (bakers' yeast) in the presence of several antifungal drugs, anticancer drugs, and environmental stresses. Using well-known statistical and machine learning techniques, we identify protein targets and side effects of these stresses with a measure of confidence. This information is further used to hypothesize previously unknown protein functions, molecular and genetic pathways, and drug interactions.

- [1]
- G. Giaever et al., "Functional Profiling of the Saccharomyces cerevisiae Genome,"
*Nature,*July 2002.

(Professors S. Shankar Sastry and Michael I. Jordan)

We are considering the problem of evaluating the performance of Kalman filtering techniques on a sensor network. Communication delay and loss of data cannot be neglected in this new scenario. Starting from the discrete Kalman filtering formulation, making use of graphical models, we have made a first attempt to generalize the Kalman filter to take into account a model for the communication accross the network sensor.