Published on Tue Feb 05 2019

Distribution-Dependent Analysis of Gibbs-ERM Principle

Ilja Kuzborskij, Nicolò Cesa-Bianchi, Csaba Szepesvári

Gibbs-ERM learning is a natural idealized model of learning with stochastic optimization algorithms. We show that, in all cases, the distribution-dependent excess risk is essentially controlled by the effective dimension.

0
0
0
Abstract

Gibbs-ERM learning is a natural idealized model of learning with stochastic optimization algorithms (such as Stochastic Gradient Langevin Dynamics and ---to some extent--- Stochastic Gradient Descent), while it also arises in other contexts, including PAC-Bayesian theory, and sampling mechanisms. In this work we study the excess risk suffered by a Gibbs-ERM learner that uses non-convex, regularized empirical risk with the goal to understand the interplay between the data-generating distribution and learning in large hypothesis spaces. Our main results are distribution-dependent upper bounds on several notions of excess risk. We show that, in all cases, the distribution-dependent excess risk is essentially controlled by the effective dimension $\mathrm{tr}\left(\boldsymbol{H}^{\star} (\boldsymbol{H}^{\star} + \lambda \boldsymbol{I})^{-1}\right)$ of the problem, where is the Hessian matrix of the risk at a local minimum. This is a well-established notion of effective dimension appearing in several previous works, including the analyses of SGD and ridge regression, but ours is the first work that brings this dimension to the analysis of learning using Gibbs densities. The distribution-dependent view we advocate here improves upon earlier results of Raginsky et al. (2017), and can yield much tighter bounds depending on the interplay between the data-generating distribution and the loss function. The first part of our analysis focuses on the localized excess risk in the vicinity of a fixed local minimizer. This result is then extended to bounds on the global excess risk, by characterizing probabilities of local minima (and their complement) under Gibbs densities, a results which might be of independent interest.

Fri Oct 04 2019
Machine Learning
Nonasymptotic estimates for Stochastic Gradient Langevin Dynamics under local conditions in nonconvex optimization
We present non-asymptotic error bounds for Stochastic Gradient Langevin Dynamics. These bounds are derived in the absence of log-concavity of the target distribution. We present two key paradigms within the framework of scalable posterior sampling.
0
0
0
Wed Oct 21 2020
Machine Learning
On Random Subset Generalization Error Bounds and the Stochastic Gradient Langevin Dynamics Algorithm
In this work, we unify several expected generalization error bounds based on random subsets using the framework developed by Hellstr\"om and Durisi [1]. We extend the bounds from Haghifam et al. [5] for Langevin dynamics to stochastic gradient dynamics.
0
0
0
Sun Feb 18 2018
Machine Learning
Local Optimality and Generalization Guarantees for the Langevin Algorithm via Empirical Metastability
We study the detailed path-wise behavior of the discrete-time Langevin algorithm for non-convex Empirical Risk Minimization. For a particular local optimum of the empirical risk, with an arbitrary initialization, we show that, with high probability, at least one of the following two events will occur.
0
0
0
Wed Jul 04 2018
Machine Learning
Quasi-Monte Carlo Variational Inference
Quasi-Monte Carlo (QMC) sampling replaces N i.i.d. samples from a uniform probability distribution by a deterministic sequence of samples of length N. We prove that this way, our algorithm can converge at a faster rate than SGD.
0
0
0
Tue Dec 26 2017
Machine Learning
Entropy-SGD optimizes the prior of a PAC-Bayes bound: Generalization properties of Entropy-SGD and data-dependent priors
Entropy-SGD (Chaudhari et al., 2017) optimizes a PAC-Bayes bound on the risk of a Gibbs (posterior) horriblyclassifier. Entropy-SGLD can be configured to yield relatively tight generalization bounds.
2
0
1
Fri Dec 04 2020
Machine Learning
Characterization of Excess Risk for Locally Strongly Convex Population Risk
We establish upper bounds for the expected excess risk of models trained by proper iterative algorithms which approximate the global minima under convex (resp. non-convex) loss functions. In contrast to the existing bounds, our results are not limited to a specific algorithm e.g.,stochastic gradient descent.
0
0
0