Published on Sat Feb 06 2016

Importance Sampling for Minibatches

Dominik Csiba, Peter Richtárik

Minibatching is a very well studied and highly popular technique in supervised learning. Another popular technique is importance sampling -- a strategy for preferential sampling of more important examples. Despite considerable effort, there is no existing work combining the power of importance sampling with the strength of minibatching.

0
0
0
Abstract

Minibatching is a very well studied and highly popular technique in supervised learning, used by practitioners due to its ability to accelerate training through better utilization of parallel processing power and reduction of stochastic variance. Another popular technique is importance sampling -- a strategy for preferential sampling of more important examples also capable of accelerating the training process. However, despite considerable effort by the community in these areas, and due to the inherent technical difficulty of the problem, there is no existing work combining the power of importance sampling with the strength of minibatching. In this paper we propose the first {\em importance sampling for minibatches} and give simple and rigorous complexity analysis of its performance. We illustrate on synthetic problems that for training data of certain properties, our sampling can lead to several orders of magnitude improvement in training time. We then test the new sampling on several popular datasets, and show that the improvement can reach an order of magnitude.

Mon Sep 16 2019
Machine Learning
Weighted Sampling for Combined Model Selection and Hyperparameter Tuning
The combined algorithm selection and hyperparameter tuning (CASH) problem is characterized by large hierarchical hyper Parameter spaces. Model-free hyperparameters are highly parallelizable across multiple machines. When no prior knowledge or meta-data exists to boost their performance, these methods commonly sample random
0
0
0
Thu Jan 18 2018
Artificial Intelligence
Faster Learning by Reduction of Data Access Time
The training time has two major components: Time to access the data and Time to process (learn from) the data. The problem has been solved using SAG, SAGA, SVRG, SAAG-II and MBSGD.
0
0
0
Wed Mar 09 2016
Machine Learning
Starting Small -- Learning with Adaptive Sample Sizes
For many machine learning problems, data is abundant and it may be prohibitive to make multiple passes through the full training set. We investigate strategies for dynamically increasing the effective sample size.
0
0
0
Wed Oct 31 2018
Machine Learning
On Exploration, Exploitation and Learning in Adaptive Importance Sampling
We study adaptive importance sampling (AIS) as an online learning problem. Borrowing ideas from the bandits literature, we propose a partition-based AIS algorithm. We then extend Daisee to adaptively learn a hierarchical sample space.
1
11
73
Wed Dec 26 2018
Machine Learning
BlinkML: Efficient Maximum Likelihood Estimation with Probabilistic Guarantees
BlinkML is a system for fast, quality-guaranteed ML training. It can speed up the training of large-scale ML tasks by 6.26x-629x. It guarantees the same predictions, with 95% probability, as the full model.
0
0
0
Tue Apr 27 2021
Machine Learning
One Backward from Ten Forward, Subsampling for Large-Scale Deep Learning
0
0
0