Published on Sun Jul 21 2019

Alice's Adventures in the Markovian World

Zhanzhan Zhao, Haoran Sun

The proposed algorithm generalizes one of the most fundamental online learning algorithms Follow-the-Leader into a linear Gauss-Markov process. The simulations compare the performance of Alice with another recent work and verify the great flexibility of Alice.

0
0
0
Abstract

This paper proposes an algorithm Alice having no access to the physics law of the environment, which is actually linear with stochastic noise, and learns to make decisions directly online without a training phase or a stable policy as initial input. Neither estimating the system parameters nor the value functions online, the proposed algorithm generalizes one of the most fundamental online learning algorithms Follow-the-Leader into a linear Gauss-Markov process setting, with a regularization term similar to the momentum method in the gradient descent algorithm, and a feasible online constraint inspired by Lyapunov's Second Theorem. The proposed algorithm is considered as a mirror optimization to the model predictive control. Only knowing the state-action alignment relationship, with the ability to observe every state exactly, a no-regret proof of the algorithm without state noise is given. The analysis of the general linear system with stochastic noise is shown with a sufficient condition for the no-regret proof. The simulations compare the performance of Alice with another recent work and verify the great flexibility of Alice.

Wed Mar 25 2020
Machine Learning
Logarithmic Regret Bound in Partially Observable Linear Dynamical Systems
Adaptive and closed-loop system identification is a challenging problem due to correlations introduced in data. We present the first model estimation method withinite-time guarantees in both open and closed loop system identification. We propose an efficient reinforcement learning algorithm that adaptively learns the system dynamics.
0
0
0
Mon Oct 28 2013
Machine Learning
Relax but stay in control: from value to algorithms for online Markov decision processes
Online learning algorithms are designed to perform in non-stationary environments. There is no notion of a dynamic state to model constraints on current and future actions as a function of past actions. The presence of the state and the assumption of an arbitrarily varying environment complicate the development of computationally efficient methods.
0
0
0
Wed Jun 10 2020
Machine Learning
Making Non-Stochastic Control (Almost) as Easy as Stochastic
Online LQR is a modern learning-theoretic take on the classical control problem. We attain the optimal regret when the linear dynamical system is unknown to the learner. Our results accomodate the full generality of the non-stochastic setting.
0
0
0
Tue Jan 14 2014
Machine Learning
Online Markov decision processes with Kullback-Leibler control cost
This paper considers an online (real-time) control problem that involves anipientagent performing a discrete-time random walk over a finite state space. The online aspect of the problem is due to the fact that the state cost functions are generated by a dynamic environment.
0
0
0
Tue May 28 2019
Machine Learning
A General Markov Decision Process Framework for Directly Learning Optimal Control Policies
We consider a new form of reinforcement learning (RL) that is based on opportunities to directly learn the optimal control policy. We establish the convergence of -learning within the context of our MDP framework. Our empirical results demonstrate the significant benefits of our approach.
0
0
0
Fri Feb 26 2021
Machine Learning
A Regret Minimization Approach to Iterative Learning Control
Planning regret replaces the standard stochastic uncertainty assumptions with worst case regret. We provide theoretical and empirical evidence that the proposed algorithm outperforms existing methods on several benchmarks.
0
0
0