Published on Mon Jul 20 2020

Minimax Policy for Heavy-tailed Bandits

Lai Wei, Vaibhav Srivastava

We study the stochastic Multi-Armed Bandit (MAB) problem under worst-case Regret and heavy-tailed reward distribution. We modify the minimax policy MOSS for the sub-Gaussian reward distribution by using saturated empirical mean to design a new algorithm called

0
0
0
Abstract

We study the stochastic Multi-Armed Bandit (MAB) problem under worst-case regret and heavy-tailed reward distribution. We modify the minimax policy MOSS for the sub-Gaussian reward distribution by using saturated empirical mean to design a new algorithm called Robust MOSS. We show that if the moment of order for the reward distribution exists, then the refined strategy has a worst-case regret matching the lower bound while maintaining a distribution-dependent logarithm regret.

Sun Feb 07 2021
Machine Learning
Regret Minimization in Heavy-Tailed Bandits
0
0
0
Sat Sep 08 2012
Machine Learning
Bandits with heavy tail
The stochastic multi-armed bandit problem is well understood when the reward distributions are sub-Gaussian. Surprisingly, moments of order 2 are sufficient to obtain regret bounds of the same order.
0
0
0
Thu Oct 20 2016
Machine Learning
Combinatorial Multi-Armed Bandit with General Reward Functions
Stochastic combinatorial multi-armed bandit framework that allows a general nonlinear reward function. Existing techniques relying on accurate means of random variables do not work directly on these functions. We propose a new algorithm called stochastically dominant confidence bound.
0
0
0
Thu Dec 08 2011
Machine Learning
Extended UCB Policy for Multi-Armed Bandit with Light-Tailed Reward Distributions
In the classic work, policies were proposed to achieve the optimal logarithmic regret order for some special classes of light-tailed reward distributions. In this paper, we extend Auer et al.'s UCB1 index policy.
0
0
0
Sun Oct 29 2017
Machine Learning
Discrepancy-Based Algorithms for Non-Stationary Rested Bandits
We study the multi-armed bandit problem where the rewards are realizations of general non-stationary stochastic processes. We present a theoretical analysis and derive regret guarantees for rested bandits. We introduce a new algorithm based on classical UCB ideas combined with the notion of weighted discrepancy.
0
0
0
Tue Jun 16 2020
Machine Learning
Corralling Stochastic Bandit Algorithms
We study the problem of corralling stochastic bandit algorithms. We give two general algorithms for this setting, which we show benefit from favorable regret guarantees. We show that the regret of the corralled algorithms is no worse than that of the best algorithm.
0
0
0