Published on Thu Feb 23 2017

Rotting Bandits

Nir Levine, Koby Crammer, Shie Mannor

The Multi-Armed Bandits (MAB) framework highlights the tension between acquiring new knowledge and leveraging available knowledge. In the classical MAB problem, a decision maker must choose an arm at each time step, upon which she receives a reward.

0
0
0
Abstract

The Multi-Armed Bandits (MAB) framework highlights the tension between acquiring new knowledge (Exploration) and leveraging available knowledge (Exploitation). In the classical MAB problem, a decision maker must choose an arm at each time step, upon which she receives a reward. The decision maker's objective is to maximize her cumulative expected reward over the time horizon. The MAB problem has been studied extensively, specifically under the assumption of the arms' rewards distributions being stationary, or quasi-stationary, over time. We consider a variant of the MAB framework, which we termed Rotting Bandits, where each arm's expected reward decays as a function of the number of times it has been pulled. We are motivated by many real-world scenarios such as online advertising, content recommendation, crowdsourcing, and more. We present algorithms, accompanied by simulations, and derive theoretical guarantees.

Fri Aug 28 2015
Machine Learning
Multi-armed Bandit Problem with Known Trend
We consider a variant of the multi-armed bandit model, where the gambler knows the shape of the reward function of each arm but not its distribution. We propose the new algorithm named A-UCB that assumes a stochastic model.
0
0
0
Thu Feb 22 2018
Machine Learning
Regional Multi-Armed Bandits
We consider a variant of the classic multi-armed bandit problem where the expected reward of each arm is a function of an unknown parameter. We propose an efficient algorithm, UCB-g, that solves the regional bandit problem by combining the Upper Confidence Bound and greedy principles.
0
0
0
Wed Nov 06 2019
Machine Learning
Multi-Armed Bandits with Correlated Arms
We consider a multi-armed bandit framework where the rewards obtained by pulling different arms are correlated. We present fundamental generalizations of classic bandit algorithms to the correlated setting. We validate the proposed algorithms via experiments on the MovieLens and Goodreads datasets.
0
0
0
Tue Nov 27 2018
Machine Learning
Rotting bandits are not harder than stochastic ones
In stochastic multi-armed bandits, the reward distribution of each arm is assumed to be stationary. We consider the non-parametric rotting bandit setting, where rewards can only decrease. We introduce the filtering on expanding window average (FEWA) algorithm.
0
0
0
Wed Dec 23 2015
Artificial Intelligence
The Max -Armed Bandit: PAC Lower Bounds and Efficient Algorithms
We consider the Max -Armed Bandit problem, where a learning agent is faced with several stochastic arms. At each time step the agent chooses an arm, and observes the reward of the obtained sample. Our basic assumption is a known lower bound on the reward distributions.
0
0
0
Sun Mar 11 2018
Machine Learning
Multi-Armed Bandits for Correlated Markovian Environments with Smoothed Reward Feedback
We study a multi-armed bandit problem in a dynamic environment where arm rewards evolve in a correlated fashion according to a Markov chain. Employing mixing-time bounds on Markov chains, we develop two learning algorithms called EpochUCB and EpochGreedy.
0
0
0