Published on Mon May 18 2020

Meta-learning with Stochastic Linear Bandits

Leonardo Cella, Alessandro Lazaric, Massimiliano Pontil
0
0
0
Abstract

We investigate meta-learning procedures in the setting of stochastic linear bandits tasks. The goal is to select a learning algorithm which works well on average over a class of bandits tasks, that are sampled from a task-distribution. Inspired by recent work on learning-to-learn linear regression, we consider a class of bandit algorithms that implement a regularized version of the well-known OFUL algorithm, where the regularization is a square euclidean distance to a bias vector. We first study the benefit of the biased OFUL algorithm in terms of regret minimization. We then propose two strategies to estimate the bias within the learning-to-learn setting. We show both theoretically and experimentally, that when the number of tasks grows and the variance of the task-distribution is small, our strategies have a significant advantage over learning the tasks in isolation.

Mon Dec 19 2016
Machine Learning
Corralling a Band of Bandit Algorithms
We study the problem of combining multiple bandit algorithms with the goal of creating a master that performs almost as well as the best base algorithm if it were to be run on its own. The main challenge is that when run with a master, base algorithms unavoidably receive much less feedback. We
0
0
0
Thu Apr 04 2019
Machine Learning
Empirical Bayes Regret Minimization
Most bandit algorithm designs are purely theoretical. They have strong regret guarantees, but also are often too conservative in practice. We report several-fold reductions in Bayes regret for state-of-the-art bandit algorithms.
0
0
0
Thu Sep 10 2020
Machine Learning
A Markov Decision Process Approach to Active Meta Learning
One challenge in meta-learning is how to exploit relationships between tasks and classes. We develop scheduling schemes based on Upper Confidence Bound (UCB), Gittins Index and Markov Decision Problems. We observe significant reductions in sample complexity relative to cyclic or i.d. sampling.
0
0
0
Fri May 28 2021
Machine Learning
Efficient Online-Bandit Strategies for Minimax Learning Problems
Several learning problems involve solving min-max problems, e.g., empirical distributional robust learning or learning with non-standard aggregated losses. We argue that the efficiency of such approaches critically depends on the structure of a set. We propose two properties of the set that facilitate designing efficient algorithms.
0
0
0
Thu Aug 19 2021
Machine Learning
Learning-to-learn non-convex piecewise-Lipschitz functions
We analyze the meta-learning of the initialization and step-size of learninggorithms for piecewise-Lipschitz functions. Asymptotically, we guarantee that the average regret across tasks scales with a natural notion of task-similarity.
0
0
0
Tue Oct 22 2019
Machine Learning
No-regret Non-convex Online Meta-Learning
The online meta-learning framework is designed for the continual lifelong learning setting. It bridges two fields: meta- learning which tries to extract prior knowledge from past tasks for fast learning of future tasks, and online-learning which deals with the sequential setting where problems are revealed one by one.
0
0
0