Published on Thu Feb 25 2021

Online Policy Gradient for Model Free Learning of Linear Quadratic Regulators with Regret

Asaf Cassel, Tomer Koren

We consider the task of learning to control a linear dynamical system under fixed quadratic costs. We present the first model-free algorithm that achieves similar regret guarantees. Our method relies on an efficient policy gradient scheme.

0
0
0
Abstract

We consider the task of learning to control a linear dynamical system under fixed quadratic costs, known as the Linear Quadratic Regulator (LQR) problem. While model-free approaches are often favorable in practice, thus far only model-based methods, which rely on costly system identification, have been shown to achieve regret that scales with the optimal dependence on the time horizon T. We present the first model-free algorithm that achieves similar regret guarantees. Our method relies on an efficient policy gradient scheme, and a novel and tighter analysis of the cost of exploration in policy space in this setting.

Wed Feb 19 2020
Machine Learning
Logarithmic Regret for Learning Linear Quadratic Regulators Efficiently
We consider the problem of learning in Linear Quadratic Control systems whose transition parameters are initially unknown. Recent results demonstrated efficient learning algorithms with regret growing with the square root of the number of decision steps.
0
0
0
Sun Feb 17 2019
Machine Learning
Learning Linear-Quadratic Regulators Efficiently with only Regret
We present the first computationally-efficient algorithm with \widetildeO regret for learning in Linear Quadratic Control systems with unknown dynamics.
0
0
0
Fri Nov 20 2020
Machine Learning
Policy Gradient Methods for the Noisy Linear Quadratic Regulator over a Finite Horizon
We explore reinforcement learning methods for finding the optimal policy in the linear quadratic regulator (LQR) problem. We are able to produce a global linear convergence guarantee for this approach in the setting of finite time horizon and stochastic state.
0
0
0
Mon Jan 15 2018
Machine Learning
Global Convergence of Policy Gradient Methods for the Linear Quadratic Regulator
Direct policy gradient methods for reinforcement learning and continuous control problems are a popular approach for a variety of reasons. These methods must solve a non-convex optimization problem, where little is understood about their efficiency from computational and statistical perspectives.
1
0
0
Tue May 28 2019
Machine Learning
Learning robust control for LQR systems with multiplicative noise via policy gradient
The linear quadratic regulator (LQR) problem has reemerged as an importantoretical benchmark for reinforcement learning-based control of complex systems. We show that the multiplicative noise LQR cost has a special property called gradient domination.
0
0
0
Fri Jun 12 2020
Machine Learning
Combining Model-Based and Model-Free Methods for Nonlinear Control: A Provably Convergent Policy Gradient Approach
Model-free learning-based control methods have seen great success recently. Such methods typically suffer from poor sample complexity and limited convergence guarantees. In this paper, we combine the two approaches to achieve the best of both worlds.
0
0
0