Published on Mon Oct 15 2018

Robust descent using smoothed multiplicative noise

Matthew J. Holland

New robust learning algorithms have been proposed in recent years. These procedures enjoy performance assurances in the form of sharp risk bounds under weak moment assumptions. They typically suffer from a large computational overhead and substantial bias when the data happens to be sub-Gaussian.

0
0
0
Abstract

To improve the off-sample generalization of classical procedures minimizing the empirical risk under potentially heavy-tailed data, new robust learning algorithms have been proposed in recent years, with generalized median-of-means strategies being particularly salient. These procedures enjoy performance guarantees in the form of sharp risk bounds under weak moment assumptions on the underlying loss, but typically suffer from a large computational overhead and substantial bias when the data happens to be sub-Gaussian, limiting their utility. In this work, we propose a novel robust gradient descent procedure which makes use of a smoothed multiplicative noise applied directly to observations before constructing a sum of soft-truncated gradient coordinates. We show that the procedure has competitive theoretical guarantees, with the major advantage of a simple implementation that does not require an iterative sub-routine for robustification. Empirical tests reinforce the theory, showing more efficient generalization over a much wider class of data distributions.

Thu Oct 27 2016
Machine Learning
Statistical Inference for Model Parameters in Stochastic Gradient Descent
The stochastic gradient descent (SGD) algorithm has been widely used inistical estimation for large-scale data. We investigate the problem ofistical inference of true model parameters based on SGD when the population loss function is strongly convex and satisfies certain smoothness conditions.
0
0
0
Wed Mar 25 2020
Machine Learning
Dimension Independent Generalization Error by Stochastic Gradient Descent
Many overparameterized models, such as neural networks, perform very well in practice, although they are often trained with simple online methods. We show that the generalization error of stochastic gradient descent solutions for convex and locally convex loss functions does not depend on the ambient dimension.
0
0
0
Thu Jun 01 2017
Machine Learning
Efficient learning with robust gradient descent
We introduce a procedure which constructs a robust approximation of the risk gradient for use in an iterative learningoutine. Using high-probability bounds on the excess risk, we show that our update does not deviate far from the ideal gradient-based update.
0
0
0
Mon Feb 19 2018
Artificial Intelligence
Robust Estimation via Robust Gradient Estimation
We provide a new computationally-efficient class of estimators for risk minimizing. We show that these estimators are robust for general statistical models. Our workhorse is a novel robust variant of gradient descent.
0
0
0
Mon May 04 2020
Machine Learning
High-Dimensional Robust Mean Estimation via Gradient Descent
We study the problem of high-dimensional robust mean estimation in the presence of a constant fraction of adversarial outliers. We show that a natural non-convex formulation of the problem can be solved directly by gradient descent. Our approach leverages a novel structural lemma.
0
0
0
Mon May 27 2019
Machine Learning
One Method to Rule Them All: Variance Reduction for Data, Parameters and Many New Methods
We propose a remarkably general variance-reduced method suitable for solving empirical risk minimization problems. We provide a single theorem establishing linear convergence of the method under smoothness and quasi strong convexity assumptions. With this theorem we recover best-known and sometimes improved rates for known methods.
0
0
0