CS 298-2
Theory Seminar

Michael W. Mahoney
Yahoo! Research

A relative-error CUR decomposition for matrices and its data applications

Monday, May 8, 2006
4pm-5pm
The Wozniak Lounge, on the fourth floor of Soda Hall



Much recent work in the Theoretical Computer Science, Linear Algebra, and
Machine Learning has considered matrix decompositions of the following form:
given an m-by-n matrix A, decompose it as a product of three matrices, C, U,
and R, where C consists of a few (typically a constant number of) columns of
A, R consists of a few (typically a constant number of) rows of A, and U is
a small carefully constructed matrix that guarantees that the product CUR is
"close" to A. Applications of such decompositions include the computation of
compressed "sketches" for large matrices in a pass-efficient manner, matrix
reconstruction, speeding up kernel-based statistical learning computations,
sparsity-preservation in low-rank approximations, and improved
interpretability of data analysis methods. In this talk we shall discuss
various choices for the matrices C, U, and R that are appropriate in
different application domains. The main result will be an algorithm that
computes matrices C, U, and R such that the (Frobenius) norm of the error
matrix A - CUR is almost minimal. We will also discuss applications of this
algorithm (and related heuristics) in recommendation system analysis as well
as the analysis of DNA microarray data, DNA SNP data, and medical imaging
data.