Electrical Engineering
      and Computer Sciences

Electrical Engineering and Computer Sciences

COLLEGE OF ENGINEERING

UC Berkeley

   

Research Projects

High-dimensional analysis of semidefinite relaxations for sparse principal components\

Martin Wainwright

Principal component analysis (PCA) is a classical method for dimensionality reduction based on extracting the dominant eigenvectors of the sample covariance matrix. However, PCA is well known to behave poorly in the ``large $\pdim$, small $\numobs$'' setting, in which the problem dimension $\pdim$ is comparable to or larger than the sample size $\numobs$. This paper studies PCA in this high-dimensional regime, but under the additional assumption that the maximal eigenvector is sparse, say with at most $\kdim$ non-zero components. We consider a spiked covariance model in which a base matrix is perturbed by adding a $\kdim$-sparse maximal eigenvector, and analyze two computationally tractable methods for recovering the support set of this maximal eigenvector: (a) a simple diagonal thresholding method, which transitions from success to failure as a function of the order parameter $\thetadiag(\numobs, \pdim, \kdim) = \numobs/[\kdim^2 \log(\pdim - \kdim)]$; and (b) a more sophisticated semidefinite programming (SDP) relaxation, which succeeds once the order parameter $\thetasdp(\numobs, \pdim, \kdim) = \numobs/[\kdim \log(\pdim - \kdim)]$ is larger than a critical threshold. In addition, we prove that no method, including the best method which has exponential-time complexity, can succeed in recovering the support if the order parameter $\thetasdp(\numobs, \pdim, \kdim)$ is below a threshold. Our results thus highlight an interesting trade-off between computational and statistical efficiency in high-dimensional inference.