Published on Thu May 16 2019

Efficient Optimization of Loops and Limits with Randomized Telescoping Sums

Alex Beatson, Ryan P. Adams

We consider optimization problems in which the objective requires an innerloop with many steps or is the limit of a sequence of increasingly costly approximations. We identify conditions under which RT estimators achieve convergence rates independent of the length of the loop or the accuracy of the approximation.

0
0
0
Abstract

We consider optimization problems in which the objective requires an inner loop with many steps or is the limit of a sequence of increasingly costly approximations. Meta-learning, training recurrent neural networks, and optimization of the solutions to differential equations are all examples of optimization problems with this character. In such problems, it can be expensive to compute the objective function value and its gradient, but truncating the loop or using less accurate approximations can induce biases that damage the overall solution. We propose randomized telescope (RT) gradient estimators, which represent the objective as the sum of a telescoping series and sample linear combinations of terms to provide cheap unbiased gradient estimates. We identify conditions under which RT estimators achieve optimization convergence rates independent of the length of the loop or the required accuracy of the approximation. We also derive a method for tuning RT estimators online to maximize a lower bound on the expected decrease in loss per unit of computation. We evaluate our adaptive RT estimators on a range of applications including meta-optimization of learning rates, variational inference of ODE parameters, and training an LSTM to model long sequences.

Mon Jun 08 2020
Machine Learning
Stochastic Optimization with Non-stationary Noise
We investigate stochastic optimization problems under relaxed assumptions on the distribution of noise. Standard results on optimal convergence rates assume either there exists a uniform bound on the second moment of the gradient noise, or that the noise decays as the algorithm progresses. When the variation in the noise is known,
0
0
0
Mon Sep 25 2017
Machine Learning
Stochastic Nonconvex Optimization with Large Minibatches
We study stochastic optimization of nonconvex loss functions, which are typical objectives for training neural networks. Our algorithms provably converge to an approximate critical point of the expected objective with faster rates than minibatches.
0
0
0
Thu Jul 20 2017
Machine Learning
Bridging the Gap between Constant Step Size Stochastic Gradient Descent and Markov Chains
We consider the minimization of an objective function given access to biased estimates of its gradient through stochastic gradient descent (SGD) We then show that Richardson-Romberg extrapolation may be used to get closer to the global optimum.
0
0
0
Tue Jul 28 2015
Neural Networks
Training recurrent networks online without backtracking
"NoBackTrack" algorithm to train parameters of dynamical systems such as recurrent neural networks. Algorithm works in an online,memoryless setting, thus requiring no backpropagation through time.
0
0
0
Thu Feb 18 2021
Machine Learning
On the Convergence of Step Decay Step-Size for Stochastic Optimization
Convergence of stochastic gradient descent is highly dependent on the step-size. We provide the convergence results for step decay in the non-convex regime. We also provide guarantees for general (possibly non-smooth) convex problems.
0
0
0
Tue Feb 28 2012
Machine Learning
A Stochastic Gradient Method with an Exponential Convergence Rate for Finite Training Sets
The proposed method incorporates a memory of previous gradient values in order to achieve a linear convergence rate. In a machine learning context, experiments indicate that the new algorithm can dramatically outperform standard algorithms.
0
0
0