New robust learning algorithms have been proposed in recent years. These procedures enjoy performance assurances in the form of sharp risk bounds under weak moment assumptions. They typically suffer from a large computational overhead and substantial bias when the data happens to be sub-Gaussian.
To improve the off-sample generalization of classical procedures minimizing
the empirical risk under potentially heavy-tailed data, new robust learning
algorithms have been proposed in recent years, with generalized median-of-means
strategies being particularly salient. These procedures enjoy performance
guarantees in the form of sharp risk bounds under weak moment assumptions on
the underlying loss, but typically suffer from a large computational overhead
and substantial bias when the data happens to be sub-Gaussian, limiting their
utility. In this work, we propose a novel robust gradient descent procedure
which makes use of a smoothed multiplicative noise applied directly to
observations before constructing a sum of soft-truncated gradient coordinates.
We show that the procedure has competitive theoretical guarantees, with the
major advantage of a simple implementation that does not require an iterative
sub-routine for robustification. Empirical tests reinforce the theory, showing
more efficient generalization over a much wider class of data distributions.