Published on Tue Dec 04 2018

Exploration versus exploitation in reinforcement learning: a stochastic control approach

Haoran Wang, Thaleia Zariphopoulou, Xunyu Zhou

We consider reinforcement learning (RL) in continuous time. We study the best trade-off between exploration of a black box and exploitation of current knowledge. We find that the optimal feedback control distribution for balancing exploitation and exploration is Gaussian.

0
0
0
Abstract

We consider reinforcement learning (RL) in continuous time and study the problem of achieving the best trade-off between exploration of a black box environment and exploitation of current knowledge. We propose an entropy-regularized reward function involving the differential entropy of the distributions of actions, and motivate and devise an exploratory formulation for the feature dynamics that captures repetitive learning under exploration. The resulting optimization problem is a revitalization of the classical relaxed stochastic control. We carry out a complete analysis of the problem in the linear--quadratic (LQ) setting and deduce that the optimal feedback control distribution for balancing exploitation and exploration is Gaussian. This in turn interprets and justifies the widely adopted Gaussian exploration in RL, beyond its simplicity for sampling. Moreover, the exploitation and exploration are captured, respectively and mutual-exclusively, by the mean and variance of the Gaussian distribution. We also find that a more random environment contains more learning opportunities in the sense that less exploration is needed. We characterize the cost of exploration, which, for the LQ case, is shown to be proportional to the entropy regularization weight and inversely proportional to the discount rate. Finally, as the weight of exploration decays to zero, we prove the convergence of the solution of the entropy-regularized LQ problem to the one of the classical LQ problem.

Tue Oct 13 2015
Machine Learning
Dual Control for Approximate Bayesian Reinforcement Learning
Bayesian reinforcement learning, reasoning about the effect of actions and future observations, offers a principled solution, but is intractable. We review, then extend an old approximate approach from control theory. This framework offers a useful approximation to the aspects of Bayesian RL.
0
0
0
Thu Jan 09 2020
Machine Learning
Regularity and stability of feedback relaxed controls
This paper proposes a relaxed control regularization with general exploration rewards to design robust feedback controls for multi-dimensional stochastic exit time problems. We show that a pre-computed feedback relaxed control has a robust performance in a perturbed system.
1
0
0
Wed Apr 04 2018
Machine Learning
Information Maximizing Exploration with a Latent Dynamics Model
Reinforcement learning algorithms must handle the trade-off between exploration and exploitation. Many state-of-the-art deep reinforcement learning methods use noise in the action selection. We present an approach that uses a model to derive rewardBonuses as a means of intrinsic motivation.
0
0
0
Wed May 02 2018
Machine Learning
Reinforcement Learning and Control as Probabilistic Inference: Tutorial and Review
Reinforcement learning or optimal control provides amathematical formalization of intelligent decision making that is powerful and broadly applicable. The connection between reinforcement learning and inference in probabilistic models is not immediately immediately obvious. In this article, we will discuss how a generalization of reinforcement learning
1
0
0
Wed Sep 01 2021our pick
Machine Learning
A Survey of Exploration Methods in Reinforcement Learning
Reinforcement learning agents depend crucially on exploration to obtain informative data for the learning process. Exploration is an essential component of reinforcement learning algorithms.
0
0
0
Mon Mar 29 2021
Artificial Intelligence
Reinforcement Learning Beyond Expectation
0
0
0