Published on Tue Nov 23 2010

Concentration inequalities of the cross-validation estimate for stable predictors

Matthieu Cornec

In this article, we derive concentration inequalities for the cross-validation estimate of the generalization error for stable predictors in the context of risk assessment. The notion of stability has been first introduced by DewA79 and extended by KUNIY02 to characterize class of predictors with infinite VC dimension.

0
0
0
Abstract

In this article, we derive concentration inequalities for the cross-validation estimate of the generalization error for stable predictors in the context of risk assessment. The notion of stability has been first introduced by \cite{DEWA79} and extended by \cite{KEA95}, \cite{BE01} and \cite{KUNIY02} to characterize class of predictors with infinite VC dimension. In particular, this covers -nearest neighbors rules, bayesian algorithm (\cite{KEA95}), boosting,... General loss functions and class of predictors are considered. We use the formalism introduced by \cite{DUD03} to cover a large variety of cross-validation procedures including leave-one-out cross-validation, -fold cross-validation, hold-out cross-validation (or split sample), and the leave--out cross-validation. In particular, we give a simple rule on how to choose the cross-validation, depending on the stability of the class of predictors. In the special case of uniform stability, an interesting consequence is that the number of elements in the test set is not required to grow to infinity for the consistency of the cross-validation procedure. In this special case, the particular interest of leave-one-out cross-validation is emphasized.

Sat Oct 30 2010
Machine Learning
Concentration inequalities of the cross-validation estimator for Empirical Risk Minimiser
In the general setting, we prove sanity-check bounds in the spirit of KR99. General loss functions and class of predictors with finite VC-dimension are considered. We focus on proving the consistency of the various cross-validation procedures.
0
0
0
Sat May 20 2017
Machine Learning
-stability for cross-validation and the choice of the number of folds
In this paper, we introduce a new concept of stability for cross-validation. We use it as a new perspective to build the general theory forCrossValidation. The new bounds quantify the stability of the one-round/average test error of the model class
0
0
0
Fri Jul 24 2020
Machine Learning
Cross-validation Confidence Intervals for Test Error
This work develops central limit theorems for cross-validation and consistent estimators of its asymptotic variance under weak stability conditions. Together, these results provide practical, asymptotically-exact confidence intervals for -fold test error.
1
0
0
Tue Feb 05 2019
Machine Learning
Consistent Risk Estimation in Moderately High-Dimensional Linear Regression
Risk estimation is at the core of many learning systems. A unifying methodology with a rigorous theory is lacking in high-dimensional settings. This paper studies the problem of risk estimation under the moderately high-dimensional asymptotic setting
0
0
0
Mon Jun 19 2017
Machine Learning
An a Priori Exponential Tail Bound for k-Folds Cross-Validation
We consider a priori generalization bounds developed in terms of cross-validation estimates and the stability of learners. We use this exponential tail bound to analyze the concentration of the k-fold cross- validation estimate. This insight raises valid concerns related to the practical use of KFCV.
0
0
0
Wed Apr 14 2010
Machine Learning
Asymptotic Equivalence of Bayes Cross Validation and Widely Applicable Information Criterion in Singular Learning Theory
In regular statistical models, the leave-one-out cross-validation is equivalent to the Akaike information criterion. However, since many learning machines are singular statistical models,. the asymptotic behavior of the cross- validation remains unknown.
0
0
0