# Low-Dimensional Models for PCA and Regression

### Dapo Omidiran

###
EECS Department

University of California, Berkeley

Technical Report No. UCB/EECS-2013-194

December 1, 2013

### http://www.eecs.berkeley.edu/Pubs/TechRpts/2013/EECS-2013-194.pdf

This thesis examines two separate statistical problems for which low-dimensional models are effective.

In the first part of this thesis, we examine the Robust Principal Components Analysis (RPCA) problem: given a matrix $\datam$ that is the sum of a low-rank matrix $\lowopt$ and a sparse noise matrix $\sparseopt$, recover $\lowopt$ and $\sparseopt$. This problem appears in various settings, including image processing, computer vision, and graphical models. Various polynomial-time heuristics and algorithms have been proposed to solve this problem. We introduce a block coordinate descent algorithm for this problem and prove a convergence result. In addition, our iterative algorithm has low complexity per iteration and empirically performs well on synthetic datasets.

In the second part of this thesis, we examine a variant of ridge regression: unlike in the classical setting where we know that the parameter of interest lies near a single point, we instead only know that it lies near a known low-dimensional subspace. We formulate this regression problem as a convex optimization problem, and introduce an efficient block coordinate descent algorithm for solving it. We demonstrate that this ``subspace prior" version of ridge regression is an appropriate model for understanding player effectiveness in basketball. In particular, we apply our algorithm to real-world data and demonstrate empirically that it produces a more accurate model of player effectiveness by showing that (1) the algorithm outperforms existing approaches and (2) it leads to a profitable betting strategy.

**Advisor:** Laurent El Ghaoui

BibTeX citation:

@phdthesis{Omidiran:EECS-2013-194, Author = {Omidiran, Dapo}, Title = {Low-Dimensional Models for PCA and Regression}, School = {EECS Department, University of California, Berkeley}, Year = {2013}, Month = {Dec}, URL = {http://www.eecs.berkeley.edu/Pubs/TechRpts/2013/EECS-2013-194.html}, Number = {UCB/EECS-2013-194}, Abstract = {This thesis examines two separate statistical problems for which low-dimensional models are effective. In the first part of this thesis, we examine the Robust Principal Components Analysis (RPCA) problem: given a matrix $\datam$ that is the sum of a low-rank matrix $\lowopt$ and a sparse noise matrix $\sparseopt$, recover $\lowopt$ and $\sparseopt$. This problem appears in various settings, including image processing, computer vision, and graphical models. Various polynomial-time heuristics and algorithms have been proposed to solve this problem. We introduce a block coordinate descent algorithm for this problem and prove a convergence result. In addition, our iterative algorithm has low complexity per iteration and empirically performs well on synthetic datasets. In the second part of this thesis, we examine a variant of ridge regression: unlike in the classical setting where we know that the parameter of interest lies near a single point, we instead only know that it lies near a known low-dimensional subspace. We formulate this regression problem as a convex optimization problem, and introduce an efficient block coordinate descent algorithm for solving it. We demonstrate that this ``subspace prior" version of ridge regression is an appropriate model for understanding player effectiveness in basketball. In particular, we apply our algorithm to real-world data and demonstrate empirically that it produces a more accurate model of player effectiveness by showing that (1) the algorithm outperforms existing approaches and (2) it leads to a profitable betting strategy.} }

EndNote citation:

%0 Thesis %A Omidiran, Dapo %T Low-Dimensional Models for PCA and Regression %I EECS Department, University of California, Berkeley %D 2013 %8 December 1 %@ UCB/EECS-2013-194 %U http://www.eecs.berkeley.edu/Pubs/TechRpts/2013/EECS-2013-194.html %F Omidiran:EECS-2013-194