Published on Wed May 05 2021

A Theoretical-Empirical Approach to Estimating Sample Complexity of DNNs

Devansh Bisla, Apoorva Nandini Saridena, Anna Choromanska
0
0
0
Abstract

This paper focuses on understanding how the generalization error scales with the amount of the training data for deep neural networks (DNNs). Existing techniques in statistical learning require computation of capacity measures, such as VC dimension, to provably bound this error. It is however unclear how to extend these measures to DNNs and therefore the existing analyses are applicable to simple neural networks, which are not used in practice, e.g., linear or shallow ones or otherwise multi-layer perceptrons. Moreover, many theoretical error bounds are not empirically verifiable. We derive estimates of the generalization error that hold for deep networks and do not rely on unattainable capacity measures. The enabling technique in our approach hinges on two major assumptions: i) the network achieves zero training error, ii) the probability of making an error on a test point is proportional to the distance between this point and its nearest training point in the feature space and at a certain maximal distance (that we call radius) it saturates. Based on these assumptions we estimate the generalization error of DNNs. The obtained estimate scales as O(1/(\delta N^{1/d})), where N is the size of the training data and is parameterized by two quantities, the effective dimensionality of the data as perceived by the network (d) and the aforementioned radius (\delta), both of which we find empirically. We show that our estimates match with the experimentally obtained behavior of the error on multiple learning tasks using benchmark data-sets and realistic models. Estimating training data requirements is essential for deployment of safety critical applications such as autonomous driving etc. Furthermore, collecting and annotating training data requires a huge amount of financial, computational and human resources. Our empirical estimates will help to efficiently allocate resources.

Mon Dec 18 2017
Neural Networks
Size-Independent Sample Complexity of Neural Networks
We study the sample complexity of learning neural networks. We provide new bounds on Rademacher complexity assuming norm constraints on the parameter matrix of each layer.
0
0
0
Fri Jul 14 2017
Machine Learning
On the Complexity of Learning Neural Networks
The stunning empirical successes of neural networks currently lack rigorous theoretical explanation. What form would such an explanation take, in the face of existing complexity-theoretic lower bounds? We demonstrate here a comprehensive lower bound ruling out this possibility.
0
0
0
Tue Jun 08 2021
Machine Learning
What training reveals about neural network complexity
This work explores the hypothesis that the complexity of the function a deep perceptionsneural network (NN) is learning can be deduced by how fast its weights change during training. More complex networks have a longer trajectory, bigger variance, and often veering further from their initialization.
2
10
49
Thu May 09 2019
Machine Learning
Data-dependent Sample Complexity of Deep Neural Networks via Lipschitz Augmentation
Existing Rademacher complexity bounds for neural networks rely only on norm control. In practice, many data-dependent techniques such as Batchnorm improve the generalization. For feedforward neural nets as well as RNNs, we obtain tighter complexity bounds.
0
0
0
Fri Mar 31 2017
Machine Learning
Computing Nonvacuous Generalization Bounds for Deep (Stochastic) Neural Networks with Many More Parameters than Training Data
One of the defining properties of deep learning is that models are chosen to have many more parameters than available training data. Logically, in order to explain generalization, we need nonvacuous bounds. We return to an idea by Langford and Caruana (2001) to compute
1
0
0
Sun Oct 05 2014
Artificial Intelligence
On the Computational Efficiency of Training Neural Networks
It is well-known that neural networks are computationally hard to train. On the other hand, in practice, modern day neural networks can be trained efficiently using SGD.
0
0
0