# 2009 Research Summary

## High-Dimensional Analysis of Semidefinite Relaxations for Sparse Principal Components

View Current Project Information

Martin Wainwright

Principal component analysis (PCA) is a classical method for dimensionality reduction based on extracting the dominant eigenvectors of the sample covariance matrix. However, PCA is well known to behave poorly in the "large *p*, small *n*" setting, in which the problem dimension *p* is comparable to or larger than the sample size *n*. This project studies PCA in this high-dimensional regime, but under the additional assumption that the maximal eigenvector is sparse, say with at most *k* non-zero components.

We consider a spiked covariance model in which a base matrix is
perturbed by adding a *k*-sparse maximal eigenvector, and analyze
two computationally tractable methods for recovering the support set
of this maximal eigenvector: (a) a simple diagonal thresholding method, which transitions from success to failure as a function of the order parameter θ_{dia}(*n*, *p*, *k* = *n*/[*k*^{2}log (*p* - *k*)]; and (b) a more sophisticated semidefinite programming (SDP) relaxation, which succeeds once the order parameter
θ_{sdp}(*n*, *p*, *k*) = *n*/[*k*log(*p*-*k*)] is larger than a critical threshold. In addition, we prove that no method, including the best method which has exponential-time complexity, can succeed in recovering the support if the order parameter θ_{sdp}(*n*, *p*, *k*) is below a threshold. Our results thus highlight an interesting tradeoff between computational and statistical efficiency in high-dimensional
inference.