Published on Tue Jan 29 2019

Numerically Recovering the Critical Points of a Deep Linear Autoencoder

Charles G. Frye, Neha S. Wadia, Michael R. DeWeese, Kristofer E. Bouchard

Numerically locating the critical points of non-convex surfaces is a central problem central to many fields. We find that several methods work well at recovering certain information about loss surfaces, but fail to take an unbiased sample of critical points. We also identify a recently-published Newton method for optimization that outperforms previous methods.

1
0
2
Abstract

Numerically locating the critical points of non-convex surfaces is a long-standing problem central to many fields. Recently, the loss surfaces of deep neural networks have been explored to gain insight into outstanding questions in optimization, generalization, and network architecture design. However, the degree to which recently-proposed methods for numerically recovering critical points actually do so has not been thoroughly evaluated. In this paper, we examine this issue in a case for which the ground truth is known: the deep linear autoencoder. We investigate two sub-problems associated with numerical critical point identification: first, because of large parameter counts, it is infeasible to find all of the critical points for contemporary neural networks, necessitating sampling approaches whose characteristics are poorly understood; second, the numerical tolerance for accurately identifying a critical point is unknown, and conservative tolerances are difficult to satisfy. We first identify connections between recently-proposed methods and well-understood methods in other fields, including chemical physics, economics, and algebraic geometry. We find that several methods work well at recovering certain information about loss surfaces, but fail to take an unbiased sample of critical points. Furthermore, numerical tolerance must be very strict to ensure that numerically-identified critical points have similar properties to true analytical critical points. We also identify a recently-published Newton method for optimization that outperforms previous methods as a critical point-finding algorithm. We expect our results will guide future attempts to numerically study critical points in large nonlinear neural networks.

Tue Feb 12 2019
Machine Learning
Towards moderate overparameterization: global convergence guarantees for training shallow neural networks
Many modern neural network architectures are trained in an overparameterized regime. This is where the parameters of the model exceed the size of the training data. In practice much more moderate levels of over Parameterization seems to be sufficient.
0
0
0
Mon May 20 2019
Machine Learning
Shaping the learning landscape in neural networks around wide flat minima
Learning in Deep Neural Networks (DNN) takes place by minimizing a non-convex high-dimensional loss function. The learning process is observed to be able to find good minimize without getting stuck in local critical points. How these two features can be kept under control in nonlinear devices composed of millions of connections is a
0
0
0
Mon Mar 05 2018
Machine Learning
Energy-entropy competition and the effectiveness of stochastic gradient descent in machine learning
Finding parameters that minimise a loss function is at the core of many machine learning methods. The Stochastic Gradient Descent algorithm is widely used and delivers state of the art results for many problems. However, it typically cannot find the global minimum, thus its effectiveness is hitherto mysterious.
0
0
0
Tue Dec 13 2016
Machine Learning
An empirical analysis of the optimization of deep network loss surfaces
The success of deep neural networks hinges on our ability to accurately and efficiently optimize high-dimensional, non-convex functions. In this paper, we investigate the loss functions of state-of-the-art networks.
0
0
0
Tue Dec 25 2018
Machine Learning
Overparameterized Nonlinear Learning: Gradient Descent Takes the Shortest Path?
Many modern learning tasks involve fitting nonlinear models to data which are mistakenlytrained in an overparameterized regime where the parameters of the model exceed the size of the training dataset. The training loss may have infinitely many global minima and it is critical to understand the properties of the solutions found by first-order optimization.
0
0
0
Wed Sep 26 2018
Machine Learning
Stochastic Second-order Methods for Non-convex Optimization with Inexact Hessian and Gradient
Trust region and cubic regularization methods have demonstrated good performance in small scale non-convex optimization. Each iteration of these methods involves computation of gradient, Hessian and function value. However, exactly computing those quantities are too expensive in large-scale problems such as training deep networks.
0
0
0