Published on Sat Jan 12 2013

Robust High Dimensional Sparse Regression and Matching Pursuit

Yudong Chen, Constantine Caramanis, Shie Mannor

We develop algorithms for support recovery in sparse regression. We are interested in understanding how many outliers, , we can tolerate, while identifying the correct support. We also show that the natural brute force algorithm that searches over all subsets of covariate/response pairs is unable to correctly identify the support with even $ n_1 = O(n/k)$ corrupted points.

0
0
0
Abstract

We consider high dimensional sparse regression, and develop strategies able to deal with arbitrary -- possibly, severe or coordinated -- errors in the covariance matrix . These may come from corrupted data, persistent experimental errors, or malicious respondents in surveys/recommender systems, etc. Such non-stochastic error-in-variables problems are notoriously difficult to treat, and as we demonstrate, the problem is particularly pronounced in high-dimensional settings where the primary goal is {\em support recovery} of the sparse regressor. We develop algorithms for support recovery in sparse regression, when some number out of total covariate/response pairs are {\it arbitrarily (possibly maliciously) corrupted}. We are interested in understanding how many outliers, , we can tolerate, while identifying the correct support. To the best of our knowledge, neither standard outlier rejection techniques, nor recently developed robust regression algorithms (that focus only on corrupted response variables), nor recent algorithms for dealing with stochastic noise or erasures, can provide guarantees on support recovery. Perhaps surprisingly, we also show that the natural brute force algorithm that searches over all subsets of covariate/response pairs, and all subsets of possible support coordinates in order to minimize regression error, is remarkably poor, unable to correctly identify the support with even $n_1 = O(n/k)$ corrupted points, where is the sparsity. This is true even in the basic setting we consider, where all authentic measurements and noise are independent and sub-Gaussian. In this setting, we provide a simple algorithm -- no more computationally taxing than OMP -- that gives stronger performance guarantees, recovering the support with up to corrupted points, where is the dimension of the signal to be recovered.

Tue Jun 05 2012
Machine Learning
Orthogonal Matching Pursuit with Noisy and Missing Data: Low and High Dimensional Results
This paper develops efficient OMP-like algorithms to deal with precisely this setting. In the high-dimensional setting, our support-recovery algorithm requires no knowledge of even the statistics of the noise.
0
0
0
Tue May 29 2018
Machine Learning
High Dimensional Robust Sparse Regression
We provide a novel -- and to the best of our knowledge, the first --algorithm for high dimensional sparse regression. Our algorithm recovers true sparse parameters with sub-linear sample complexity. We demonstrate the effectiveness on large-scale sparse regression problems with arbitrary corruptions.
0
0
0
Thu Jun 07 2012
Machine Learning
A New Greedy Algorithm for Multiple Sparse Regression
This paper proposes a new algorithm for multiple sparse regression in high dimensions. The task is to estimate support and values of several sparse vectors from a few noisy linear measurements. Our algorithm is a "forward-backward" greedy procedure that operates on two distinct classes of objects.
0
0
0
Sun Nov 22 2020
Machine Learning
Online Orthogonal Matching Pursuit
0
0
0
Thu Dec 05 2013
Machine Learning
Swapping Variables for High-Dimensional Sparse Regression with Correlated Measurements
SWAP is a simple greedy algorithm that swaps variables until convergence. It is surprisingly effective in handling measurement matrices with high correlations. SWAP can be used to boost the performance of any sparse regression algorithm.
0
0
0
Mon Oct 17 2011
Machine Learning
Joint variable and rank selection for parsimonious estimation of high-dimensional matrices
We propose dimension reduction methods for sparse, high-dimensional response regression models. We introduce estimators that are based on penalized least squares. We prove that these estimators adapt to the unknown metrics sparsity and have fast rates of convergence.
0
0
0