Published on Sat Jan 04 2014

Least Squares Policy Iteration with Instrumental Variables vs. Direct Policy Search: Comparison Against Optimal Benchmarks Using Energy Storage

Warren R. Scott, Warren B. Powell, Somayeh Moazehi

This paper studies approximate policy iteration (API) methods which use Bellman error minimization for policy evaluation. An alternative to these API methods is direct policy search based on knowledge gradient. The practical performance of these three approximate dynamic programming methods are then investigated.

0
0
0
Abstract

This paper studies approximate policy iteration (API) methods which use least-squares Bellman error minimization for policy evaluation. We address several of its enhancements, namely, Bellman error minimization using instrumental variables, least-squares projected Bellman error minimization, and projected Bellman error minimization using instrumental variables. We prove that for a general discrete-time stochastic control problem, Bellman error minimization using instrumental variables is equivalent to both variants of projected Bellman error minimization. An alternative to these API methods is direct policy search based on knowledge gradient. The practical performance of these three approximate dynamic programming methods are then investigated in the context of an application in energy storage, integrated with an intermittent wind energy supply to fully serve a stochastic time-varying electricity demand. We create a library of test problems using real-world data and apply value iteration to find their optimal policies. These benchmarks are then used to compare the developed policies. Our analysis indicates that API with instrumental variables Bellman error minimization prominently outperforms API with least-squares Bellman error minimization. However, these approaches underperform our direct policy search implementation.

Tue Apr 06 2021
Machine Learning
MPC-based Reinforcement Learning for Economic Problems with Application to Battery Storage
0
0
0
Mon Nov 28 2016
Artificial Intelligence
Accelerated Gradient Temporal Difference Learning
Family of temporal difference (TD) methods span a spectrum from frugal linear methods to data efficient least squares methods. Least square methods make the best use of available data and do not require tuning a typically highly sensitive learning rate parameter. TD methods require quadratic computation and storage.
0
0
0
Wed Jul 21 2021
Artificial Intelligence
Optimal Operation of Power Systems with Energy Storage under Uncertainty: A Scenario-based Method with Strategic Sampling
The multi-period dynamics of energy storage (ES), intermittent renewable generation and uncontrollable power loads, make the optimization of power system operation (PSO) challenging. A multi- period optimal PSO under uncertainty is formulated using the chance-constrained optimization (CCO) paradigm.
1
0
0
Tue Jun 02 2020
Machine Learning
Learning optimal environments using projected stochastic gradient ascent
In this work, we propose a new methodology for jointly sizing a dynamical system and designing its control law. The objective of the optimization problem is to jointly find a control policy and an environment over the joint hypothesis space of parameters.
0
0
0
Mon Nov 03 2014
Artificial Intelligence
NESTA, The NICTA Energy System Test Case Archive
Many of the well established power system test cases were developed in the 1960s with the aim of testing AC power flow algorithms. It is unclear if these power flow test cases are suitable for power system optimizing studies.
0
0
0
Mon Aug 02 2021
Machine Learning
Prescribing net demand for electricity market clearing
We consider a two-stage electricity market comprising a forward and a real-time settlement. The former pre-dispatches the power system following a least-cost merit order and facing an uncertain net demand. The latteropes with the plausible deviations by making use of power regulation during the actual
8
0
0