Published on Tue Nov 17 2020

C-Learning: Learning to Achieve Goals via Recursive Classification

Benjamin Eysenbach, Ruslan Salakhutdinov, Sergey Levine

We study the problem of predicting and controlling the future state distribution of an autonomous agent. Our work lays a principled foundation for goal-conditioned RL as density estimation. This foundation makes hypotheses about Q-learning, including the optimal goal-sampling ratio, which we confirm experimentally.

0
0
0
Abstract

We study the problem of predicting and controlling the future state distribution of an autonomous agent. This problem, which can be viewed as a reframing of goal-conditioned reinforcement learning (RL), is centered around learning a conditional probability density function over future states. Instead of directly estimating this density function, we indirectly estimate this density function by training a classifier to predict whether an observation comes from the future. Via Bayes' rule, predictions from our classifier can be transformed into predictions over future states. Importantly, an off-policy variant of our algorithm allows us to predict the future state distribution of a new policy, without collecting new experience. This variant allows us to optimize functionals of a policy's future state distribution, such as the density of reaching a particular goal state. While conceptually similar to Q-learning, our work lays a principled foundation for goal-conditioned RL as density estimation, providing justification for goal-conditioned methods used in prior work. This foundation makes hypotheses about Q-learning, including the optimal goal-sampling ratio, which we confirm experimentally. Moreover, our proposed method is competitive with prior goal-conditioned RL methods.

Fri Jan 03 2020
Artificial Intelligence
Making Sense of Reinforcement Learning and Probabilistic Inference
Reinforcement learning (RL) combines a control problem with statistical estimation. In all but the most simple settings, the resulting inference is computationally intractable. We demonstrate that the popular `RL as inference' approximation can perform poorly in even very basic problems.
0
0
0
Tue Mar 23 2021
Machine Learning
Replacing Rewards with Examples: Example-Based Policy Search via Recursive Classification
0
0
0
Thu Dec 12 2019
Machine Learning
Learning to Reach Goals via Iterated Supervised Learning
supervised imitation learning provides a simple and stable alternative, it requires access to demonstrations from a human supervisor. We propose a simple algorithm in which an agent continually imitates the trajectories it generates to progressively learn goal-reaching behaviors from scratch.
2
2
4
Mon May 14 2012
Artificial Intelligence
Efficient Bayes-Adaptive Reinforcement Learning using Sample-Based Search
Bayesian model-based reinforcement learning is a formally elegant approach to learning optimal behaviour under model uncertainty. Finding the resulting Bayes-optimal policies is notoriously taxing, since the search space becomes monstrous. In this paper we introduce a tractable, sample-based method for roughly approximating
0
0
0
Mon Dec 12 2016
Artificial Intelligence
Online Reinforcement Learning for Real-Time Exploration in Continuous State and Action Markov Decision Processes
This paper presents a new method to learn online policies in continuous state, continuous action, model-free Markov decision processes. Our algorithm, the Fitted Policy Forest algorithm (FPF), computes a regression forest representing the Q-value and transforms it into a single tree representing the policy.
0
0
0
Tue Feb 14 2012
Artificial Intelligence
Learning is planning: near Bayes-optimal reinforcement learning via Monte-Carlo tree search
Bayes-optimal behavior, while well-defined, is often difficult to achieve. Recent advances in Monte-Carlo tree search have shown that it is possible to act near-optimally in Markov Decision Processes.
0
0
0