Published on Fri Jul 03 2020

Expected Eligibility Traces

Hado van Hasselt, Sephora Madjiheurem, Matteo Hessel, David Silver, André Barreto, Diana Borsa

The question of how to determine which states and actions are responsible for a certain outcome is known as the credit assignment problem. Expected traces allow, with a single update, to update states that could have preceded the current state, even if they did not do so on this occasion.

0
0
0
Abstract

The question of how to determine which states and actions are responsible for a certain outcome is known as the credit assignment problem and remains a central research question in reinforcement learning and artificial intelligence. Eligibility traces enable efficient credit assignment to the recent sequence of states and actions experienced by the agent, but not to counterfactual sequences that could also have led to the current state. In this work, we introduce expected eligibility traces. Expected traces allow, with a single update, to update states and actions that could have preceded the current state, even if they did not do so on this occasion. We discuss when expected traces provide benefits over classic (instantaneous) traces in temporal-difference learning, and show that sometimes substantial improvements can be attained. We provide a way to smoothly interpolate between instantaneous and expected traces by a mechanism similar to bootstrapping, which ensures that the resulting algorithm is a strict generalisation of TD(). Finally, we discuss possible extensions and connections to related ideas, such as successor features.

Fri Oct 31 2008
Artificial Intelligence
Temporal Difference Updating without a Learning Rate
We derive an equation for temporal difference learning from statistical principles. We test this new learning rule against TD(lambda) and find that it offers superior performance in various settings. We then investigate how to extend our new temporal difference algorithm to reinforcement learning.
0
0
0
Sat Dec 07 2019
Artificial Intelligence
From Reinforcement Learning to Optimal Control: A unified framework for sequential decisions
We focus on two of the most important fields: stochastic optimal control and reinforcement learning. Building on prior work, we describe a unified framework that covers 15 different communities. We make the case that the framework of reinforcement learning is quite limited.
0
0
0
Sun Oct 06 2019
Machine Learning
Probabilistic Successor Representations with Kalman Temporal Differences
The effectiveness of Reinforcement Learning (RL) depends on an animal'sability to assign credit for rewards to the appropriate preceding stimuli. The Successor Representation (SR), which enforces generalisation over states that predict similar outcomes, has become an increasingly popular model in this space. We propose
0
0
0
Tue May 30 2017
Artificial Intelligence
Experience Replay Using Transition Sequences
Experience replay is one of the most commonly used approaches to improve the efficiency of reinforcement learning algorithms. In this work, we propose an approach to select and replay sequences of transitions in order to accelerating the learning of a reinforcement learning agent. We also artificially construct transition sequences using information gathered from previous interactions.
0
0
0
Wed Sep 14 2016
Artificial Intelligence
Bayesian Reinforcement Learning: A Survey
Bayesian methods for machine learning have been widely investigated. We provide an in-depth review of the role of Bayesian methods in the reinforcement learning (RL) paradigm. The paper is acomprehensive survey on Bayesian RL algorithms.
0
0
0
Wed Aug 19 2015
Machine Learning
Learning to Predict Independent of Span
Conventional algorithms wait until an outcome is observed to update their predictions. We show that the exact same predictions can be learned in a much more computationally congenial way. We apply this idea to various settings of increasing generality.
0
0
0