Published on Sun Jun 06 2021

Towards an Understanding of Benign Overfitting in Neural Networks

Zhu Li, Zhi-Hua Zhou, Arthur Gretton
0
0
0
Abstract

Modern machine learning models often employ a huge number of parameters and are typically optimized to have zero training loss; yet surprisingly, they possess near-optimal prediction performance, contradicting classical learning theory. We examine how these benign overfitting phenomena occur in a two-layer neural network setting where sample covariates are corrupted with noise. We address the high dimensional regime, where the data dimension grows with the number of data points. Our analysis combines an upper bound on the bias with matching upper and lower bounds on the variance of the interpolator (an estimator that interpolates the data). These results indicate that the excess learning risk of the interpolator decays under mild conditions. We further show that it is possible for the two-layer ReLU network interpolator to achieve a near minimax-optimal learning rate, which to our knowledge is the first generalization result for such networks. Finally, our theory predicts that the excess learning risk starts to increase once the number of parameters grows beyond , matching recent empirical findings.

Wed Jun 26 2019
Machine Learning
Benign Overfitting in Linear Regression
The phenomenon of benign overfitting is one of the key mysteries uncovered by deep learning methodology. We give a characterization of linear regression problems for which the minimum norm interpolating prediction rule has near-optimal accuracy.
0
0
0
Wed Aug 25 2021
Machine Learning
The Interplay Between Implicit Bias and Benign Overfitting in Two-Layer Linear Networks
The recent success of neural network models has shone light on a rather surprising statistical phenomenon. Statistical models that perfectly fit noisy data can generalize well to unseen test data. Understanding this phenomenon has attracted intense theoretical and empirical study.
0
0
0
Sat May 25 2019
Machine Learning
Empirical Risk Minimization in the Interpolating Regime with Application to Neural Network Learning
A common strategy to train deep neural networks (DNNs) is to train them until they (almost) achieve zero training error. In statistical learning theory it is known that over-fitting models may lead to poor generalization properties. So-called interpolation methods have recently received much attention.
1
2
13
Mon Mar 02 2020
Machine Learning
Double Trouble in Double Descent : Bias and Variance(s) in the Lazy Regime
Deep neural networks can achieve remarkable generalization performances while interpolating the training data perfectly. Rather than the U-curve emblematic of the bias-variance trade-off, their test error often follows a "double descent"
0
0
0
Tue Mar 16 2021
Machine Learning
Deep learning: a statistical viewpoint
The remarkable practical success of deep learning has revealed some major surprises from a theoretical perspective. Simple gradient methods easily find near-optimal solutions to non-convex optimization problems. Despite giving a near-perfect fit to training data without any explicit efforts to control model complexity, these methods exhibit excellent predictive accuracy.
2
1
1
Thu Jun 11 2020
Machine Learning
Generalization error in high-dimensional perceptrons: Approaching Bayes error with convex optimization
We study the generalization performances of standard.classifiers in the high-dimensional regime where. is kept finite in the limit of a high dimension and. number of samples . We prove a formula for the. generalization error achieved
0
0
0