Published on Thu Jun 18 2020

Stochastic bandits with arm-dependent delays

Anne Gael Manegueu, Claire Vernade, Alexandra Carpentier, Michal Valko

Stochastic delayed bandit algorithms are limited by strong assumptions on the delay distributions. In this work, we weaken them significantly and only assume that there is a bound on the tail of the delay.

0
0
0
Abstract

Significant work has been recently dedicated to the stochastic delayed bandit setting because of its relevance in applications. The applicability of existing algorithms is however restricted by the fact that strong assumptions are often made on the delay distributions, such as full observability, restrictive shape constraints, or uniformity over arms. In this work, we weaken them significantly and only assume that there is a bound on the tail of the delay. In particular, we cover the important case where the delay distributions vary across arms, and the case where the delays are heavy-tailed. Addressing these difficulties, we propose a simple but efficient UCB-based algorithm called the PatientBandits. We provide both problems-dependent and problems-independent bounds on the regret as well as performance lower bounds.

Fri Jun 04 2021
Machine Learning
Stochastic Multi-Armed Bandits with Unrestricted Delay Distributions
We study the stochastic Multi-Armed Bandit (MAB) problem with random delays. We consider two settings: the reward-dependent delay setting, and the Reward-independent delay setting. Our main contribution is algorithms that achieve near-optimal regret in each setting.
2
0
0
Sat May 22 2021
Machine Learning
Combinatorial Blocking Bandits with Stochastic Delays
We consider the general combinatorial setting where more than one arms can be played at each round. We allow the blocking time of each arm to be stochastic. We provide a tight analysis of the approximation guarantee of a natural greedy heuristic that always plays the maximum expected reward feasible subset.
2
0
0
Wed Feb 06 2013
Machine Learning
Bounded regret in stochastic multi-armed bandits
We study the stochastic multi-armed bandit problem when one knows the value of an optimal arm. We propose a new randomized policy that attains a regret uniformly bounded over time.
0
0
0
Tue May 20 2014
Machine Learning
Unimodal Bandits: Regret Lower Bounds and Optimal Algorithms
We consider stochastic multi-armed bandits where the expected reward is a function over partially ordered arms. Our algorithm optimally exploits the unimodal structure of the problem, and surprisingly, its regret does not depend on the number of arms.
0
0
0
Wed Sep 07 2016
Artificial Intelligence
Random Shuffling and Resets for the Non-stationary Stochastic Bandit Problem
We consider a non-stationary formulation of the stochastic multi-armed bandit task. The rewards are no longer assumed to be identically distributed. We also show that the original Successive Elimination fails to have controlled regret.
0
0
0
Mon Oct 12 2020
Machine Learning
Adapting to Delays and Data in Adversarial Multi-Armed Bandits
We consider the adversarial multi-armed bandit problem under delayed feedback. We analyze variants of the Exp3 algorithm that tune their step-size. We obtain regret guarantees that adapt to the observed (rather than worst-case) sequences of delays.
0
0
0