Published on Thu Oct 23 2014

Attribute Efficient Linear Regression with Data-Dependent Sampling

Doron Kukliansky, Ohad Shamir

In this paper we analyze a budgeted learning setting, in which the learner can only observe a small subset of the attributes of each training example. We develop efficient algorithms for ridge and lasso linear regression, which utilize the geometry of the data.

0
0
0
Abstract

In this paper we analyze a budgeted learning setting, in which the learner can only choose and observe a small subset of the attributes of each training example. We develop efficient algorithms for ridge and lasso linear regression, which utilize the geometry of the data by a novel data-dependent sampling scheme. When the learner has prior knowledge on the second moments of the attributes, the optimal sampling probabilities can be calculated precisely, and result in data-dependent improvements factors for the excess risk over the state-of-the-art that may be as large as , where is the problem's dimension. Moreover, under reasonable assumptions our algorithms can use less attributes than full-information algorithms, which is the main concern in budgeted learning settings. To the best of our knowledge, these are the first algorithms able to do so in our setting. Where no such prior knowledge is available, we develop a simple estimation technique that given a sufficient amount of training examples, achieves similar improvements. We complement our theoretical analysis with experiments on several data sets which support our claims.

Tue Feb 09 2016
Machine Learning
Online Active Linear Regression via Thresholding
We consider the problem of online active learning to collect data for regression modeling. Our main contribution is a novel threshold-based algorithm for selection of most informative observations. We extend the algorithm and its guarantees toparse linear regression in high-dimensional settings.
0
0
0
Wed Mar 07 2018
Machine Learning
Revisiting differentially private linear regression: optimal and adaptive prediction & estimation in unbounded domain
We revisit the problem of linear regression under a differential privacy constraint. We propose simple modifications of two existing DP algorithms. Both AdaOPS and AdaSSP outperform the existing techniques on nearly all data sets.
0
0
0
Thu Jun 20 2019
Machine Learning
Online A-Optimal Design and Active Linear Regression
The key challenge of this work lies in the heteroscedasticity assumption. The goal of the decision maker is then to figure out on the fly the optimal way to allocate the total budget of samples. Combining techniques from bandit and convex optimization we propose a new active sampling
0
0
0
Wed May 19 2021
Machine Learning
L1 Regression with Lewis Weights Subsampling
We consider the problem of finding an approximate solution to regression while only observing a small number of labels. Given an unlabeled data matrix , we must choose a small set of rows to observe.
1
0
0
Tue Aug 23 2011
Machine Learning
Optimal Algorithms for Ridge and Lasso Regression with Partially Observed Attributes
We present simple and efficient algorithms for these problems. For Lasso and Ridge regression they need the same total number of attributes as do full-information algorithms. For Support- vector regression we require exponentially less attributes compared to the state of the art.
0
0
0
Tue Aug 14 2018
Machine Learning
Adaptive Sampling for Convex Regression
In this paper, we introduce the first principled adaptive-sampling procedure for learning a convex function in the norm. We also corroborate our theoretical contributions with numerical experiments.
0
0
0