---------------------
Inheritance hierarchy
---------------------

The main components of the MPC-RL framework are the agents, which are responsible for
interacting with the environment and, in case they are able to do so, learning the
optimal policy from these interactions.

But before jumping into the details of the different agents, it is important to
understand the hierarchy of the different classes that are used to implement the agents
and their relationships. The following diagram shows the inheritance scheme of the
different agents.

.. currentmodule:: mpcrl

.. inheritance-diagram::
   Agent
   LearningAgent
   GlobOptLearningAgent
   RlLearningAgent
   LstdDpgAgent
   LstdQLearningAgent
   :parts: 1


Callbacks
=========

While some of the classes in the diagram are outside the scope of this documentation,
let us notice that the :class:`Agent` and the :class:`LearningAgent`
classes inherit from the mixins :class:`core.callbacks.AgentCallbackMixin` and
:class:`core.callbacks.LearningAgentCallbackMixin`, respectively. These base
classes are fundamental to the implementation of the agents as they provide the backbone
for other functionalities, such as updates and schedulers, to be hooked into each agent
and be called with a specific frequency, e.g., at the end of every episode or after 100
time steps. Of course, this is internally vital for learning agents, as they need to
update their parametrization with a given frequency. Nonetheless, also end users can
benefit from these callbacks: they allow to implement logic that needs to be executed
when specific events occur, e.g., updating disturbance profiles, changing references,
etc.. This topic is discussed further in :ref:`user_guide`'s :ref:`user_guide_callbacks`
and in :ref:`module_reference`'s :ref:`module_reference_callbacks`.


Agents
======

Now, for the agents! As seen in the diagram above, the simplest agent class is
:class:`Agent`. This class implements a basic agent that can interact with an
environment, but not learn from it. From there, the abstract classes
:class:`LearningAgent` and :class:`RlLearningAgent`
are derived, which introduce learning capabilities to the agents. Parallel to latter,
which is oriented towards gradient-based RL solutions, the abstract
:class:`GlobOptLearningAgent` defines the layout for agents that leverage Global
Optimzation (i.e., gradient-free) strategies rather than gradient-based ones. Finally,
the concrete classes :class:`LstdDpgAgent` and :class:`LstdQLearningAgent`
implement the DPG and Q-learning algorithms, respectively, to tune the MPC controller's
parameters. More details about all these agents can be found in the next sections of the
:ref:`user_guide`.