Published on Wed Jul 04 2018

Factored Bandits

Julian Zimmert, Yevgeny Seldin

We introduce the factored bandits model, which is a framework for learning with limited (bandit) feedback. We provide an anytime algorithm for stochastic factored bandits and up to constants matching upper and lower regret bounds for the problem.

0
0
0
Abstract

We introduce the factored bandits model, which is a framework for learning with limited (bandit) feedback, where actions can be decomposed into a Cartesian product of atomic actions. Factored bandits incorporate rank-1 bandits as a special case, but significantly relax the assumptions on the form of the reward function. We provide an anytime algorithm for stochastic factored bandits and up to constants matching upper and lower regret bounds for the problem. Furthermore, we show that with a slight modification the proposed algorithm can be applied to utility based dueling bandits. We obtain an improvement in the additive terms of the regret bound compared to state of the art algorithms (the additive terms are dominating up to time horizons which are exponential in the number of arms).

Sat May 23 2020
Machine Learning
A Novel Confidence-Based Algorithm for Structured Bandits
We study finite-armed stochastic bandits where the rewards of each arm might be correlated to those of other arms. We introduce a novel phased algorithm that exploits the given structure to build confidence sets over the parameters of the true bandit problem.
0
0
0
Tue Jun 16 2020
Machine Learning
Corralling Stochastic Bandit Algorithms
We study the problem of corralling stochastic bandit algorithms. We give two general algorithms for this setting, which we show benefit from favorable regret guarantees. We show that the regret of the corralled algorithms is no worse than that of the best algorithm.
0
0
0
Wed May 14 2014
Machine Learning
Reducing Dueling Bandits to Cardinal Bandits
The Dueling Banditsproblem is an online model of learning with ordinal feedback of the form "A is preferred to B" We present algorithms for reducing the Dueling Bandits problem to the Traditional Multi-Armed Bandits problem.
0
0
0
Mon May 19 2014
Machine Learning
Lipschitz Bandits: Regret Lower Bounds and Optimal Algorithms
We consider stochastic multi-armed bandit problems where the expected reward is a Lipschitz function of the arm. The set of arms is either discrete or continuous. For discrete bandits, we derive asymptoticproblem specific lower bounds for the regret satisfied by any algorithm.
0
0
0
Tue Feb 04 2014
Machine Learning
Online Stochastic Optimization under Correlated Bandit Feedback
High-confidence tree (HCT) algorithm is a novel any-time -armed bandit algorithm. Main advantage of HCT is that it handles the challenging case of correlated rewards. HCT can be applied to the problem of policy search in reinforcement learning.
0
0
0
Thu Jul 02 2020
Machine Learning
Structure Adaptive Algorithms for Stochastic Bandits
We study reward maximisation in a wide class of structured stochastic multi-armed bandit problems. We develop asymptotically optimal algorithms from instance-dependent lower-bounds using iterative saddle-point solvers. Our approach generalises recent iterative methods for pure exploration.
0
0
0