Published on Mon Dec 07 2015

How to Discount Deep Reinforcement Learning: Towards New Dynamic Strategies

Vincent François-Lavet, Raphael Fonteneau, Damien Ernst

Deep neural nets have recently been shown to be very powerful for solving problems approaching real-world complexity. We discuss the role that the discount factor may play in the quality of the learning process of a deep Q-network (DQN) We also describe the possibility to fall within a

0
0
0
Abstract

Using deep neural nets as function approximator for reinforcement learning tasks have recently been shown to be very powerful for solving problems approaching real-world complexity. Using these results as a benchmark, we discuss the role that the discount factor may play in the quality of the learning process of a deep Q-network (DQN). When the discount factor progressively increases up to its final value, we empirically show that it is possible to significantly reduce the number of learning steps. When used in conjunction with a varying learning rate, we empirically show that it outperforms original DQN on several experiments. We relate this phenomenon with the instabilities of neural networks when they are used in an approximate Dynamic Programming setting. We also describe the possibility to fall within a local optimum during the learning process, thus connecting our discussion with the exploration/exploitation dilemma.

Thu Dec 06 2018
Artificial Intelligence
Deep Reinforcement Learning and the Deadly Triad
Sutton and Barto identify a deadly triad of function approximation, bootstrapping, and off-policy learning. When these properties are combined, learning can diverge with the value estimates becoming unbounded.
0
0
0
Fri Feb 12 2016
Machine Learning
Using Deep Q-Learning to Control Optimization Hyperparameters
We present a novel definition of the reinforcement learning state, actions and reward function that allows a deep Q-network (DQN) to learn to control an optimizing hyperparameter. Using Q-learning with experience replay, we train two DQNs to accept a state representation of an
0
0
0
Sun Aug 06 2017
Artificial Intelligence
An Information-Theoretic Optimality Principle for Deep Reinforcement Learning
We address the problem of Q-value overestimation in deep reinforcement learning to handle high-dimensional state spaces efficiently. By adapting concepts from information theory, we introduce an intrinsic penalty that encourages reduced Q- Value estimates. The resultant algorithm encompasses a wide range of learning outcomes containing deep Q-
0
0
0
Tue Jun 29 2021
Machine Learning
A Convergent and Efficient Deep Q Network Algorithm
Deep Q network (DQN) is still not well understood and it does not guarantee convergence. To overcome these problems, we propose a convergent DQN algorithm. We show that the algorithm is convergent and can work with large discount factors (0.9998)
0
0
0
Thu Jul 16 2020
Machine Learning
Meta-Gradient Reinforcement Learning with an Objective Discovered Online
The algorithm discovers its own objective, flexibly parameterised by a deep neural network, solely from interactive experience with its environment. Over time, this allows the agent to learn how to learn increasingly effectively. We demonstrate that the algorithm discovers how to address several important issues in RL.
1
1
0
Fri Jun 11 2021
Machine Learning
Taylor Expansion of Discount Factors
In practical reinforcement learning (RL), the discount factor used for estimating value functions often differs from that used for defining the objective. In this work, we study the effect that this discrepancy has during learning, and discover a family of objectives that interpolate value functions.
4
4
29