Published on Tue Feb 04 2014

Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits

Alekh Agarwal, Daniel Hsu, Satyen Kale, John Langford, Lihong Li, Robert E. Schapire

We present a new algorithm for the contextual bandit learning problem. The learner repeatedly takes one of actions in response to the observed context, and observes the reward only for that chosen action. Our method assumes access to an oracle for solving fully supervised cost-sensitive

0
0
0
Abstract

We present a new algorithm for the contextual bandit learning problem, where the learner repeatedly takes one of actions in response to the observed context, and observes the reward only for that chosen action. Our method assumes access to an oracle for solving fully supervised cost-sensitive classification problems and achieves the statistically optimal regret guarantee with only oracle calls across all rounds, where is the number of policies in the policy class we compete against. By doing so, we obtain the most practical contextual bandit learning algorithm amongst approaches that work for general policy classes. We further conduct a proof-of-concept experiment which demonstrates the excellent computational and prediction performance of (an online variant of) our algorithm relative to several baselines.

Sat Feb 06 2016
Machine Learning
BISTRO: An Efficient Relaxation-Based Method for Contextual Bandits
We present efficient algorithms for the problem of contextual bandits. Our algorithm BISTRO requires d calls to the empirical risk minimization (ERM) oracle per round. The method uses unlabeled data to make the problem computationally simple.
0
0
0
Mon Feb 22 2010
Machine Learning
Contextual Bandit Algorithms with Supervised Learning Guarantees
We address the problem of learning in an online, bandit setting. The learner must repeatedly select among actions, but only receives partial feedback. We establish two new facts: First, using a new algorithm called Exp4.P, we show that it is possible to compete
0
0
0
Fri May 24 2019
Machine Learning
OSOM: A simultaneously optimal algorithm for multi-armed and linear contextual bandits
We consider the stochastic linear (multi-armed) contextual bandit problem. Algorithms that are designed solely for one of the regimes are known to be sub-optimal for the other. We design a single computationally efficient algorithm that simultaneously obtains problem-dependent optimal regret
0
0
0
Wed Mar 04 2020
Machine Learning
Taking a hint: How to leverage loss predictors in contextual bandits?
Study of learning in contextual bandits with the help of loss predictors. Main question is whether one can improve over the minimax regret for learning over rounds. We provide upper and lower bounds for various settings: adversarial versus stochastic environments.
0
0
0
Mon Feb 23 2015
Machine Learning
Contextual Dueling Bandits
We consider the problem of learning to choose actions using contextual information when provided with limited feedback. We propose a new and natural solution concept, rooted in game theory, called a von Neumann winner. The first of these algorithmsachieves particularly low regret, even when data is adversarial.
0
0
0
Mon Jun 13 2011
Artificial Intelligence
Efficient Optimal Learning for Contextual Bandits
We address the problem of learning in an online setting. We provide the first efficient algorithm with an optimal regret. Our algorithm uses a cost sensitive classification learner as an oracle.
0
0
0