Published on Sun Sep 09 2018

Variance Reduction in Monte Carlo Counterfactual Regret Minimization (VR-MCCFR) for Extensive Form Games using Baselines

Martin Schmid, Neil Burch, Marc Lanctot, Matej Moravcik, Rudolf Kadlec, Michael Bowling

A common method for this setting, Monte Carlo Counterfactual Regret Minimization (MCCFR), can have slow long-term convergence rates due to high variance. In this paper, we introduce a variance reducing technique that applies to any sampling variant of MCCFR.

0
0
0
Abstract

Learning strategies for imperfect information games from samples of interaction is a challenging problem. A common method for this setting, Monte Carlo Counterfactual Regret Minimization (MCCFR), can have slow long-term convergence rates due to high variance. In this paper, we introduce a variance reduction technique (VR-MCCFR) that applies to any sampling variant of MCCFR. Using this technique, per-iteration estimated values and updates are reformulated as a function of sampled values and state-action baselines, similar to their use in policy gradient reinforcement learning. The new formulation allows estimates to be bootstrapped from other estimates within the same episode, propagating the benefits of baselines along the sampled trajectory; the estimates remain unbiased even when bootstrapping from other estimates. Finally, we show that given a perfect baseline, the variance of the value estimates can be reduced to zero. Experimental evaluation shows that VR-MCCFR brings an order of magnitude speedup, while the empirical variance decreases by three orders of magnitude. The decreased variance allows for the first time CFR+ to be used with sampling, increasing the speedup to two orders of magnitude.

Mon Jul 22 2019
Artificial Intelligence
Low-Variance and Zero-Variance Baselines for Extensive-Form Games
Extensive-form games (EFGs) are a common model of multi-agent interactions with imperfect information. State-of-the-art algorithms for solving these games typically perform full walks of the game tree. We propose new baseline functions that result in significantly reduced variance compared to existing techniques.
0
0
0
Wed Oct 10 2018
Machine Learning
Lazy-CFR: fast and near optimal regret minimization for extensive games with imperfect information
Counterfactual regret minimization (CFR) is the most popular algorithm on two-player zero-sum extensive games with imperfect information. CFR has to traverse the whole game tree in each round, which is time-consuming in large scale games. Lazy-CFR needs only to traverse $O(\sqrt{|
0
0
0
Thu Aug 27 2020
Artificial Intelligence
The Advantage Regret-Matching Actor-Critic
Regret minimization has played a key role in online learning, equilibrium computation in games, and reinforcement learning (RL) In this paper, we describe a general model-free RL method for no-regret learning based on repeated reconsideration of past behavior.
0
0
0
Tue Sep 11 2018
Artificial Intelligence
Solving Imperfect-Information Games via Discounted Regret Minimization
Counterfactual regret minimization (CFR) is the most popular and, in practice, fastest approach to solving large imperfect-information games. In this paper we introduce novel CFR variants that discount regrets from earlier iterations in various ways.
0
0
0
Mon Sep 12 2016
Artificial Intelligence
Reduced Space and Faster Convergence in Imperfect-Information Games via Regret-Based Pruning
Counterfactual Regret Minimization (CFR) is the most popular iterativegorithm for solving zero-sum imperfect-information games. Regret-Based Pruning (RBP) is an improvement that allows poorly-performing actions to be temporarily pruned.
0
0
0
Wed Feb 19 2020
Artificial Intelligence
Stochastic Regret Minimization in Extensive-Form Games
Monte-Carlo counterfactual regret minimization (MCCFR) is the state-of-the-art algorithm for solving sequential games that are too large for full tree traversals. However, stochastic methods for sequential games have not been investigated extensively beyond MCCFR.
0
0
0