Published on Thu Dec 24 2020

Assured RL: Reinforcement Learning with Almost Sure Constraints

Agustin Castellano, Juan Bazerque, Enrique Mallada

We consider the problem of finding optimal policies for a Markov Decision Process with almost sure constraints on state transitions and action triplets. We develop a Barrier-learning algorithm, based on Q-Learning, that identifies such unsafe state-action pairs.

0
0
0
Abstract

We consider the problem of finding optimal policies for a Markov Decision Process with almost sure constraints on state transitions and action triplets. We define value and action-value functions that satisfy a barrier-based decomposition which allows for the identification of feasible policies independently of the reward process. We prove that, given a policy {\pi}, certifying whether certain state-action pairs lead to feasible trajectories under {\pi} is equivalent to solving an auxiliary problem aimed at finding the probability of performing an unfeasible transition. Using this interpretation,we develop a Barrier-learning algorithm, based on Q-Learning, that identifies such unsafe state-action pairs. Our analysis motivates the need to enhance the Reinforcement Learning (RL) framework with an additional signal, besides rewards, called here damage function that provides feasibility information and enables the solution of RL problems with model-free constraints. Moreover, our Barrier-learning algorithm wraps around existing RL algorithms, such as Q-Learning and SARSA, giving them the ability to solve almost-surely constrained problems.

Tue May 18 2021
Machine Learning
Learning to Act Safely with Limited Exposure and Almost Sure Certainty
This paper aims to put forward the concept that learning to take safe actions can be achieved without the need for an unbounded number of exploratory trials. We first focus on the canonical multi-armed bandit problem. We then consider the problem of finding optimal policies for a Markov Decision Process (MDP) with almost sure constraints.
0
0
0
Thu Jun 03 2021
Machine Learning
A Provably-Efficient Model-Free Algorithm for Constrained Markov Decision Processes
Triple-Q is an algorithm for Constrained Markov Decision Processes (CMDPs) with sublinear regret and zero constraint violation. Under Triple-Q, at each step, an action is chosen based on the pseudo-Q-value that is a combination of the three Q values.
0
0
0
Thu Feb 27 2020
Artificial Intelligence
Cautious Reinforcement Learning via Distributional Risk in the Dual Domain
We study the estimation of risk-sensitive policies in reinforcement learning. We propose a new definition of risk, which we call caution, as a penalty function. The penalty function is a function of the policy's long-term state occupancy distribution.
0
0
0
Wed Dec 30 2020
Artificial Intelligence
Is Pessimism Provably Efficient for Offline RL?
offline reinforcement learning (RL) aims to learn an optimal policy based on a dataset collected a priori. offline RL suffers from the insufficient coverage of the dataset, which eludes most existing theoretical analysis. In this paper, we propose a pessimistic variant of the value iteration algorithm (PEVI)
0
0
0
Wed Feb 19 2020
Machine Learning
Optimistic Policy Optimization with Bandit Feedback
Policy optimization methods are one of the most widely used classes of Reinforcement Learning (RL) algorithms. We consider model-based RL in the tabular finite-horizon MDP setting with unknown transitions and bandit feedback.
0
0
0
Wed Sep 26 2018
Machine Learning
Omega-Regular Objectives in Model-Free Reinforcement Learning
We provide the first solution for model-free reinforcement learning for Markov decision processes. A key feature of our technique is the compilation of regular properties into limit- deterministic Buechi automata instead of the traditional Rabin automata.
0
0
0