Published on Mon Oct 28 2019

On Generalization Bounds of a Family of Recurrent Neural Networks

Minshuo Chen, Xingguo Li, Tuo Zhao

Recurrent Neural Networks (RNNs) have been widely applied to sequential data analysis. Due to their complicated modeling structures, however, the theory behind is still largely missing. We study the generalization properties of vanilla RNNs as well as their variants.

0
0
0
Abstract

Recurrent Neural Networks (RNNs) have been widely applied to sequential data analysis. Due to their complicated modeling structures, however, the theory behind is still largely missing. To connect theory and practice, we study the generalization properties of vanilla RNNs as well as their variants, including Minimal Gated Unit (MGU), Long Short Term Memory (LSTM), and Convolutional (Conv) RNNs. Specifically, our theory is established under the PAC-Learning framework. The generalization bound is presented in terms of the spectral norms of the weight matrices and the total number of parameters. We also establish refined generalization bounds with additional norm assumptions, and draw a comparison among these bounds. We remark: (1) Our generalization bound for vanilla RNNs is significantly tighter than the best of existing results; (2) We are not aware of any other generalization bounds for MGU, LSTM, and Conv RNNs in the exiting literature; (3) We demonstrate the advantages of these variants in generalization.

Mon Feb 04 2019
Neural Networks
Can SGD Learn Recurrent Neural Networks with Provable Generalization?
Recurrent Neural Networks (RNNs) are among the most popular models in sequential data analysis. Existing generalization bounds for RNNsscale exponentially with the input length. In this paper, we show using the vanilla stochastic gradient descent (SGD), that a RNN
0
0
0
Mon Oct 29 2018
Neural Networks
On the Convergence Rate of Training Recurrent Neural Networks
How can local-search methods such as stochastic gradient descent (SGD) avoid bad local minima in training multi-layer neural networks? Why can they fit random labels even given non-convex and non-smooth architectures?
0
0
0
Thu Jan 12 2017
Neural Networks
Simplified Minimal Gated Unit Variations for Recurrent Neural Networks
Recurrent neural networks with various types of hidden units have been used to solve a diverse range of problems involving sequence data. Two of the most recent proposals, gated recurrent units (GRU) and minimal gated units (MGU), have shown comparable promising results on example public datasets.
0
0
0
Tue Mar 09 2021
Machine Learning
UnICORNN: A recurrent model for learning very long time dependencies
The design of recurrent neural networks (RNNs) is very challenging on account of the exploding and vanishing gradient problem. To overcome this, we propose a structure preserving discretization of a Hamiltonian system of second-order ordinary differential equations.
2
1
6
Mon Oct 31 2016
Neural Networks
Full-Capacity Unitary Recurrent Neural Networks
Recurrent neural networks are powerful models for processing sequential data. But they are generally plagued by vanishing and exploding gradient problems. We propose full-capacity uRNNs that optimize their recurrence matrix over all unitary matrices.
0
0
0
Wed Oct 25 2017
Neural Networks
On the Long-Term Memory of Deep Recurrent Networks
A key attribute that drives the unprecedented success of modern Recurrent Neural Networks (RNNs) is their ability to model intricate long-term temporal dependencies. However, a well established measure of RNNs long- term memory capacity is lacking. We introduce a measure of the network's ability to support information flow across time.
0
0
0