Inheritance hierarchy#
The main components of the MPC-RL framework are the agents, which are responsible for interacting with the environment and, in case they are able to do so, learning the optimal policy from these interactions.
But before jumping into the details of the different agents, it is important to understand the hierarchy of the different classes that are used to implement the agents and their relationships. The following diagram shows the inheritance scheme of the different agents.

Callbacks#
While some of the classes in the diagram are outside the scope of this documentation,
let us notice that the Agent and the LearningAgent
classes inherit from the mixins core.callbacks.AgentCallbackMixin and
core.callbacks.LearningAgentCallbackMixin, respectively. These base
classes are fundamental to the implementation of the agents as they provide the backbone
for other functionalities, such as updates and schedulers, to be hooked into each agent
and be called with a specific frequency, e.g., at the end of every episode or after 100
time steps. Of course, this is internally vital for learning agents, as they need to
update their parametrization with a given frequency. Nonetheless, also end users can
benefit from these callbacks: they allow to implement logic that needs to be executed
when specific events occur, e.g., updating disturbance profiles, changing references,
etc.. This topic is discussed further in User guide’s Callbacks
and in Module reference’s Callbacks.
Agents#
Now, for the agents! As seen in the diagram above, the simplest agent class is
Agent. This class implements a basic agent that can interact with an
environment, but not learn from it. From there, the abstract classes
LearningAgent and RlLearningAgent
are derived, which introduce learning capabilities to the agents. Parallel to latter,
which is oriented towards gradient-based RL solutions, the abstract
GlobOptLearningAgent defines the layout for agents that leverage Global
Optimzation (i.e., gradient-free) strategies rather than gradient-based ones. Finally,
the concrete classes LstdDpgAgent and LstdQLearningAgent
implement the DPG and Q-learning algorithms, respectively, to tune the MPC controller’s
parameters. More details about all these agents can be found in the next sections of the
User guide.