# 2009 Research Summary

## High-Dimensional Statistical Inference, Sparse Modeling

View Current Project Information

Bin Yu and Martin Wainwright

National Science Foundation

This research consists of four closely-related research thrusts, all centered around the common goal of an integrated treatment of statistical and computational issues in dealing with high-dimensional data sets arising in information technology (IT). The first two research thrusts focus on fundamental issues that arise in the design of penalty-based and other algorithmic methods for regularization. Key open problems to be addressed include the link between regularization methods and sparsity, consistency and other theoretical issues, as well as structured regularization methods for model selection. Sparse models are desirable both for scientific reasons (e.g., interpretability), and for computational reasons (e.g., efficiency of performing classification or regression). The third research thrust focuses on problems of statistical inference in decentralized settings, which are of increasing importance for a broad variety of IT applications such as wireless sensor networks, computer server "farms," and traffic monitoring systems. Designing suitable data compression schemes is the key challenge. On one hand, these schemes should respect the decentralization requirements imposed by the system (e.g., due to limited power or bandwidth of communicating data); on the other hand, they should also be (near)-optimal with respect to a statistical criterion of merit (e.g., Bayes error for a classification task, MSE for a regression or smoothing problem). The fourth project addresses statistical issues centered around the use of Markov random fields, widely used for modeling large collections of interacting random variables, and associated variational methods for approximating moments and likelihoods in such models.