Published on Mon Jul 20 2020

Filtered Poisson Process Bandit on a Continuum

James A. Grant, Roberto Szechtman

We consider a version of the continuum armed bandit where an action induces a filtered realisation of a non-homogeneous Poisson process. The decision-maker seeks to maximise the expected number of revealed points. We propose an upper confidence bound algorithm for this problem.

0
0
0
Abstract

We consider a version of the continuum armed bandit where an action induces a filtered realisation of a non-homogeneous Poisson process. Point data in the filtered sample are then revealed to the decision-maker, whose reward is the total number of revealed points. Using knowledge of the function governing the filtering, but without knowledge of the Poisson intensity function, the decision-maker seeks to maximise the expected number of revealed points over T rounds. We propose an upper confidence bound algorithm for this problem utilising data-adaptive discretisation of the action space. This approach enjoys O(T^(2/3)) regret under a Lipschitz assumption on the reward function. We provide lower bounds on the regret of any algorithm for the problem, via new lower bounds for related finite-armed bandits, and show that the orders of the upper and lower bounds match up to a logarithmic factor.

Mon Jul 17 2017
Artificial Intelligence
Online Multi-Armed Bandit
We introduce a novel variant of the multi-armed bandit problem. Bandits are streamed one at a time to the player. At each point, the player can either choose to pull the current bandit or move on to the next bandit.
0
0
0
Mon Feb 08 2021
Machine Learning
Correlated Bandits for Dynamic Pricing via the ARC algorithm
The Asymptotic Randomised Control (ARC) algorithm provides a rigorous approximation to the optimal strategy for a wide class of Bayesian bandits. The algorithm is guaranteed to asymptotically optimise the expected discounted payoff.
0
0
0
Sat Jan 22 2005
Machine Learning
Bandit Problems with Side Observations
0
0
0
Tue Jul 10 2018
Machine Learning
Bandits with Side Observations: Bounded vs. Logarithmic Regret
We consider the classical stochastic multi-armed bandit but where, from time to time, an extra observation is gathered by the agent for free. We prove that, no matter how small is, an agent can ensure a regret uniformly bounded in time.
0
0
0
Sat Sep 08 2012
Machine Learning
Bandits with heavy tail
The stochastic multi-armed bandit problem is well understood when the reward distributions are sub-Gaussian. Surprisingly, moments of order 2 are sufficient to obtain regret bounds of the same order.
0
0
0
Tue May 25 2021
Machine Learning
Bias-Robust Bayesian Optimization via Dueling Bandit
We consider Bayesian optimization in settings where observations can be adversarially biased. Our analysis further generalizes a previously proposed linear bandit model to non-linear reward functions. We obtain the first efficient algorithm for dueling bandits.
2
0
0