Reinforcement Learning with Model Predictive Control#

Model Predictive Control-based Reinforcement Learning (mpcrl, for short) is a library for training model-based Reinforcement Learning (RL) [15] agents with Model Predictive Control (MPC) as function approximation [12].

PyPI version MIT License Python 3.9

Tests Docs Downloads Maintainability Test Coverage ruff

Short introduction#

This framework, also referred to as RL with/using MPC, was first proposed in [5] and has so far been shown effective in various applications, with different learning algorithms and more sound theory, e.g., [2, 3, 7, 16]. It merges two powerful control techinques into a single data-driven one

  • MPC, a well-known control methodology that exploits a prediction model to predict the future behaviour of the environment and compute the optimal action

  • and RL, a Machine Learning paradigm that showed many successes in recent years (with games such as chess, Go, etc.) and is highly adaptable to unknown and complex-to-model environments.

The figure below shows the main idea behind this learning-based control approach. The MPC controller, parametrized in its objective, predictive model and constraints (or a subset of these), acts both as policy provider (i.e., providing an action to the environment, given the current state) and as function approximation for the state and action value functions (i.e., predicting the expected return following the current control policy from the given state and state-action pair). Concurrently, an RL algorithm is employed to tune this parametrization of the MPC in such a way to increase the controller’s performance and achieve an (sub)optimal policy. For this purpose, different algorithms can be employed, two of the most successful being Q-learning [3] and Deterministic Policy Gradient (DPG) [2].

Main idea behind the MPC-RL framework

Diagram of the MPC-RL framework.#

Main idea behind the MPC-RL framework

Diagram of the MPC-RL framework.#

Author#

Filippo Airaldi, PhD Candidate [f.airaldi@tudelft.nl | filippoairaldi@gmail.com] at Delft Center for Systems and Control in Delft University of Technology.

Copyright (c) 2025 Filippo Airaldi.

Copyright notice: Technische Universiteit Delft hereby disclaims all copyright interest in the program “mpcrl” (Reinforcement Learning with Model Predictive Control) written by the Author(s). Prof. Dr. Ir. Fred van Keulen, Dean of ME.

Indices and tables#

References#

[1]

Christof Büskens and Helmut Maurer. Sensitivity analysis and real-time optimization of parametric nonlinear programming problems. In Martin Grötschel, Sven O. Krumke, and Jörg Rambau, editors, Online Optimization of Large Scale Systems, pages 3–16. Springer, Berlin, Heidelberg, 2001.

[2] (1,2)

Wenqi Cai, Arash B. Kordabad, Hossein N. Esfahani, Anastasios M. Lekkas, and Sébastien Gros. MPC-based reinforcement learning for a simplified freight mission of autonomous surface vehicles. In 2021 60th IEEE Conference on Decision and Control (CDC), volume, 2990–2995. 2021.

[3] (1,2)

Hossein Nejatbakhsh Esfahani, Arash Bahari Kordabad, and Sébastien Gros. Approximate robust NMPC using reinforcement learning. In 2021 European Control Conference (ECC), volume, 132–137. 2021.

[4]

Sébastien Gros and Mario Zanon. Towards safe reinforcement learning using NMPC and policy gradients: part II - deterministic case. CoRR, 2019. URL: http://arxiv.org/abs/1906.04034, arXiv:1906.04034.

[5]

Sébastien Gros and Mario Zanon. Data-driven economic NMPC using reinforcement learning. IEEE Transactions on Automatic Control, 65(2):636–648, 2020.

[6]

Sébastien Gros and Mario Zanon. Reinforcement learning based on mpc and the stochastic policy gradient method. In 2021 American Control Conference (ACC), volume, 1947–1952. 2021.

[7]

Sébastien Gros and Mario Zanon. Learning for MPC with stability & safety guarantees. Automatica, 146:110598, 2022.

[8]

Geoffrey Hinton, Nitish Srivastava, and Kevin Swersky. Neural networks for machine learning. Lecture 6a: Overview of mini-batch gradient descent, 6a:31, 2012.

[9]

Diederik P. Kingma and Jimmy Ba. Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.

[10]

Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101, 2017.

[11]

Jorge Nocedal and Stephen J. Wright. Numerical Optimization. Springer, 2006.

[12]

James Blake Rawlings, David Q. Mayne, and Moritz Diehl. Model Predictive Control: Theory, Computation, and Design. Nob Hill Publishing, Madison, USA, 2 edition, 2018.

[13]

Sashank J. Reddi, Satyen Kale, and Sanjiv Kumar. On the convergence of Adam and beyond. arXiv preprint arXiv:1904.09237, 2019.

[14]

Ilya Sutskever, James Martens, George Dahl, and Geoffrey Hinton. On the importance of initialization and momentum in deep learning. In Sanjoy Dasgupta and David McAllester, editors, Proceedings of the 30th International Conference on Machine Learning, volume 28 of Proceedings of Machine Learning Research, 1139–1147. Atlanta, Georgia, USA, 17–19 Jun 2013. PMLR.

[15]

Richard S. Sutton and Andrew G. Barto. Reinforcement learning: An introduction. MIT press, 2018.

[16]

Mario Zanon and Sébastien Gros. Safe reinforcement learning using robust MPC. IEEE Transactions on Automatic Control, 66(8):3638–3652, 2021.