Published on Wed Jan 02 2019

The capacity of feedforward neural networks

Pierre Baldi, Roman Vershynin

The capacity of an architecture is defined by the number of functions it can compute, as the synaptic weights are varied. The capacity is given by a cubic polynomial in the layer sizes. We use the main result to identify architectures with maximal or minimal capacity.

0
0
0
Abstract

A long standing open problem in the theory of neural networks is the development of quantitative methods to estimate and compare the capabilities of different architectures. Here we define the capacity of an architecture by the binary logarithm of the number of functions it can compute, as the synaptic weights are varied. The capacity provides an upper bound on the number of bits that can be extracted from the training data and stored in the architecture during learning. We study the capacity of layered, fully-connected, architectures of linear threshold neurons with layers of size $n_1,n_2, \ldots, n_L$ and show that in essence the capacity is given by a cubic polynomial in the layer sizes: $C(n_1,\ldots, n_L)=\sum_{k=1}^{L-1} \min(n_1,\ldots,n_k)n_kn_{k+1}$, where layers that are smaller than all previous layers act as bottlenecks. In proving the main result, we also develop new techniques (multiplexing, enrichment, and stacking) as well as new bounds on the capacity of finite sets. We use the main result to identify architectures with maximal or minimal capacity under a number of natural constraints. This leads to the notion of structural regularization for deep architectures. While in general, everything else being equal, shallow networks compute more functions than deep networks, the functions computed by deep networks are more regular and "interesting".

Mon Mar 09 2015
Machine Learning
Deep Learning and the Information Bottleneck Principle
Deep Neural Networks (DNNs) are analyzed via the theoretical framework of the information bottleneck (IB) principle. We argue that both the optimal architecture, number of layers and features/connections at each layer, are related to the bifurcation points of the bottleneck tradeoff.
3
1
8
Fri Sep 03 2021
Machine Learning
Dive into Layers: Neural Network Capacity Bounding using Algebraic Geometry
The learnability of a neural network is directly related to its size. We borrow a tool in topological algebra: Betti numbers to measure the topological geometric complexity of input data and the neural network. The code will be publicly available.
0
0
0
Sun Aug 20 2017
Neural Networks
A Capacity Scaling Law for Artificial Neural Networks
We derive the calculation of two critical numbers predicting the behavior of neural networks. The LM dimension is a generalization of the Vapnik--Chervonenkis dimension that avoids structured data. The MacKay dimension indicates a 50% chance of not being able to train a given function.
0
0
0
Fri Feb 22 2019
Machine Learning
Capacity allocation through neural network layers
In the highly non-linear limit where decoupling is total, we show that the propagation of capacity throughout the layers follows asimple markovian rule. This allows us to recover some known results about deep neural networks, such as the size of the effective receptive field.
0
0
0
Wed Oct 03 2018
Machine Learning
Understanding Weight Normalized Deep Neural Networks with Rectified Linear Units
This paper presents a general framework for norm-based capacity control. We establish the upper bound on the Rademacher complexities of this family. We discuss properties of awidth-independent capacity control, which only depends on depth by a square root.
0
0
0
Mon Mar 11 2019
Machine Learning
Scaling up deep neural networks: a capacity allocation perspective
The shattering problem in deep neural networks can only be avoided if the capacity propagation through layers has a non-degenerate continuous limit. We find that weights and biases should be scaled down as the inverse square root of the number of layers for deep residual networks.
0
0
0