Published on Mon May 27 2019

Path Planning Problems with Side Observations-When Colonels Play Hide-and-Seek

Dong Quan Vu, Patrick Loiseau, Alonso Silva, Long Tran-Thanh

The online CB and HS games can be cast as path planning problems with side-observations. At each stage, alearner chooses a path on a directed acyclic graph and suffers the sum of progressivelylosses that are assigned to the corresponding edges. We propose a novel

0
0
0
Abstract

Resource allocation games such as the famous Colonel Blotto (CB) and Hide-and-Seek (HS) games are often used to model a large variety of practical problems, but only in their one-shot versions. Indeed, due to their extremely large strategy space, it remains an open question how one can efficiently learn in these games. In this work, we show that the online CB and HS games can be cast as path planning problems with side-observations (SOPPP): at each stage, a learner chooses a path on a directed acyclic graph and suffers the sum of losses that are adversarially assigned to the corresponding edges; and she then receives semi-bandit feedback with side-observations (i.e., she observes the losses on the chosen edges plus some others). We propose a novel algorithm, EXP3-OE, the first-of-its-kind with guaranteed efficient running time for SOPPP without requiring any auxiliary oracle. We provide an expected-regret bound of EXP3-OE in SOPPP matching the order of the best benchmark in the literature. Moreover, we introduce additional assumptions on the observability model under which we can further improve the regret bounds of EXP3-OE. We illustrate the benefit of using EXP3-OE in SOPPP by applying it to the online CB and HS games.

Fri Jul 12 2019
Machine Learning
Exploration by Optimisation in Partial Monitoring
We provide a simple and efficient algorithm for adversarial -action-outcome non-degenerate locally observable partial monitoring game. The same algorithm also achieves near-optimal regret for full information, bandit and globally observable games.
0
0
0
Wed Feb 24 2021
Machine Learning
Combinatorial Pure Exploration with Bottleneck Reward Function and its Extension to General Reward Functions
In this paper, we study the Combinatorial Pure Exploration problem with the bottleneck reward function (CPE-B) under the fixed-confidence and fixed-budget settings. CPE-B captures a variety of practical scenarios such as network routing in communication networks.
0
0
0
Mon Dec 07 2020
Machine Learning
Minimax Regret for Stochastic Shortest Path with Adversarial Costs and Known Transition
0
0
0
Mon Jun 13 2011
Machine Learning
From Bandits to Experts: On the Value of Side-Observations
We consider an adversarial online learning setting where a decision maker can choose an action in every stage of the game. The decision maker gets side observations on the reward he would have obtained had he chosen some of the other actions. We develop practical algorithms with provable regret guarantees.
0
0
0
Wed Aug 24 2011
Machine Learning
Non-trivial two-armed partial-monitoring games are bandits
We consider online learning in partial-monitoring games against an oblivious adversary. We show that when the number of actions available to the learner is two and the game is nontrivial then it is reducible to a bandit-like game.
0
0
0
Mon Mar 08 2021
Machine Learning
Model-Free Online Learning in Unknown Sequential Decision Making Problems and Games
0
0
0