Published on Wed Aug 13 2014

Asymptotic and finite-sample properties of estimators based on stochastic gradients

Panos Toulis, Edoardo M. Airoldi

Stochastic gradient descent procedures have gained popularity for approximation from large data sets. Their statistical properties are not well understood, in theory. In practice, avoiding numerical instability requires careful tuning of key parameters. Intuitively, implicit updates shrink standard implementations.

0
0
0
Abstract

Stochastic gradient descent procedures have gained popularity for parameter estimation from large data sets. However, their statistical properties are not well understood, in theory. And in practice, avoiding numerical instability requires careful tuning of key parameters. Here, we introduce implicit stochastic gradient descent procedures, which involve parameter updates that are implicitly defined. Intuitively, implicit updates shrink standard stochastic gradient descent updates. The amount of shrinkage depends on the observed Fisher information matrix, which does not need to be explicitly computed; thus, implicit procedures increase stability without increasing the computational burden. Our theoretical analysis provides the first full characterization of the asymptotic behavior of both standard and implicit stochastic gradient descent-based estimators, including finite-sample error bounds. Importantly, analytical expressions for the variances of these stochastic gradient-based estimators reveal their exact loss of efficiency. We also develop new algorithms to compute implicit stochastic gradient descent-based estimators for generalized linear models, Cox proportional hazards, M-estimators, in practice, and perform extensive experiments. Our results suggest that implicit stochastic gradient descent procedures are poised to become a workhorse for approximate inference from large data sets

Tue Sep 22 2015
Machine Learning
Stochastic gradient descent methods for estimation with large data sets
We develop methods for parameter estimation in settings where traditional methods are no longer tenable. Our methods rely on grotesquestochastic approximations, which are computationally efficient. Our sgd package in R offers the most extensive and robust implementation of stochastic gradient descent methods.
0
0
0
Mon May 27 2019
Machine Learning
One Method to Rule Them All: Variance Reduction for Data, Parameters and Many New Methods
We propose a remarkably general variance-reduced method suitable for solving empirical risk minimization problems. We provide a single theorem establishing linear convergence of the method under smoothness and quasi strong convexity assumptions. With this theorem we recover best-known and sometimes improved rates for known methods.
0
0
0
Sat Jul 01 2017
Machine Learning
On Scalable Inference with Stochastic Gradient Descent
Stochastic gradient descent (SGD) has gained increasing popularity due to its numerical convenience and memory efficiency. Traditional resampling method such as the bootstrap is not computationally feasible since it requires to repeatedly draw independently samples from the entire dataset.
0
0
0
Thu Oct 27 2016
Machine Learning
Statistical Inference for Model Parameters in Stochastic Gradient Descent
The stochastic gradient descent (SGD) algorithm has been widely used inistical estimation for large-scale data. We investigate the problem ofistical inference of true model parameters based on SGD when the population loss function is strongly convex and satisfies certain smoothness conditions.
0
0
0
Mon Feb 19 2018
Artificial Intelligence
Robust Estimation via Robust Gradient Estimation
We provide a new computationally-efficient class of estimators for risk minimizing. We show that these estimators are robust for general statistical models. Our workhorse is a novel robust variant of gradient descent.
0
0
0
Mon Feb 08 2016
Machine Learning
A Variational Analysis of Stochastic Gradient Algorithms
Stochastic Gradient Descent (SGD) is an important algorithm in machine learning. With constant learning rates, it is a stochastic process that generates samples from a stationary distribution. We show that SGD with constant rates can be effectively used as an approximate posterior inference algorithm.
0
0
0