Gradient-based on-policy learning agents#

The following examples showcase how to use gradient-based Reinforcement Learning techniques (in particular, Q-learning and Deterministic Policy Gradient) to train a Model Predictive Controller (MPC) scheme for a simple task in an on-policy fashion.