.. _module_reference: ================ Module reference ================ This page contains all the detailed information about the modules and classes in :mod:`mpcrl`. First, we will indulge in presenting the core components of the library that allow us to easily implement Reinforcement Learning algorithms. Then, we will move to the agents themselves, which contain these algorithms and deploy them to control the given environments (and possibly learn from interacting with it). Finally, the different optimization strategies that can be used to update the parameters of the MPC controller are reported, and the utility functions and wrappers that can be used to enhance the behaviour of the agents are also presented. --------------- Core components --------------- Before jumping into the details of the agents and their Reinforcement Learning algorithms, we present here the core elements that are used during training and evaluation, but are not the agents themselves. .. automodule:: mpcrl.core We'll start first with the latter, i.e., the building blocks of the library, and only then move to the other former, i.e., the other core elements that allow to specify the hyperparameters for our agents. Building blocks =============== In this section, we present the building blocks of the package, which are at the core of the internal workings of the agents and their learning algorithms. These include the callback mechanisms, the learnable parameters, the scheduling quantities, and our custom exceptions and warnings. .. _module_reference_callbacks: Callbacks --------- .. automodule:: mpcrl.core.callbacks .. autosummary:: :toctree: generated :template: class.rst :caption: Callbacks :nosignatures: CallbackMixin AgentCallbackMixin LearningAgentCallbackMixin Learnable parameters -------------------- .. automodule:: mpcrl.core.parameters .. currentmodule:: mpcrl .. autosummary:: :toctree: generated :template: class.rst :caption: Learnable parameters :nosignatures: LearnableParameter LearnableParametersDict Scheduling quantities --------------------- What if you need to decay or increase your learning rate over time during training? The following submodule provides a set of schedulers that can be used to update or decay different quantities, such as learning rates or exploration probability, over time. Most of the agents will then accept a scheduler as an argument, which will be updated according to the user-specified way. .. currentmodule:: mpcrl.core .. autosummary:: :toctree: generated :template: module.rst :caption: Scheduling quantities schedulers Exceptions ---------- Finally, we also provide two custom warnings and exceptions to signal two distinct and important events, namely, when the MPC solver fails to find a solution, and when the update fails (usually the QP solver fails to find a solution). Since the methods :meth:`mpcrl.Agent.evaluate`, :meth:`mpcrl.LearningAgent.train` and :meth:`mpcrl.LearningAgent.train_offpolicy` accept the ``raises`` argument, we provide here both warnings and exceptions that can be raised in case of failures, depending on the value of said flag. We also provide two utility functions to conveniently raise these exceptions or warnings. .. autosummary:: :toctree: generated :template: module.rst :caption: Exceptions errors Hyperparameters =============== Update strategy --------------- .. automodule:: mpcrl.core.update .. currentmodule:: mpcrl .. autosummary:: :toctree: generated :template: class.rst :caption: Update strategy :nosignatures: UpdateStrategy Experience replay ----------------- .. automodule:: mpcrl.core.experience .. currentmodule:: mpcrl .. autosummary:: :toctree: generated :template: class.rst :caption: Experience replay :nosignatures: ExperienceReplay Exploring --------- .. automodule:: mpcrl.core.exploration .. autosummary:: :toctree: generated :template: class.rst :caption: Exploring :nosignatures: ExplorationStrategy NoExploration GreedyExploration EpsilonGreedyExploration OrnsteinUhlenbeckExploration StepWiseExploration Warmstarting the MPC solvers ---------------------------- .. automodule:: mpcrl.core.warmstart .. currentmodule:: mpcrl .. autosummary:: :toctree: generated :template: class.rst :caption: Warmstarting the MPC solvers :nosignatures: WarmStartStrategy .. _module_reference_agents: ------ Agents ------ Agents are the main and, arguably, the most important components of the package. They deploy the control policies to control the given environments, and, if they are learning-based, also implement the underlying learning algorithm to tune the parameters of the control policies. .. currentmodule:: mpcrl .. inheritance-diagram:: Agent LearningAgent GlobOptLearningAgent RlLearningAgent LstdDpgAgent LstdQLearningAgent :parts: 1 Base agents =========== What follows are the base classes for the agents in the package. These are either non-learning agents (i.e., :class:`Agent`) or abstract learning agents that provide the layout for inheriting classes. .. autosummary:: :toctree: generated :template: class.rst :caption: Base agents :nosignatures: Agent LearningAgent RlLearningAgent Reinforcement Learning agents ============================= These are the learning agents that leverage a reinforcement learning algorithm to tune the parametrization of the MPC controller. Two very common algorithms are here implemented: Q-learning and Deterministic Policy Gradient (DPG). .. autosummary:: :toctree: generated :template: class.rst :caption: Reinforcement Learning agents :nosignatures: LstdDpgAgent LstdQLearningAgent Other learning agents ===================== We also provide other learning agents that do not use gradient-based approaches to update their parameters, but rather rely on other global gradient-free optimization techniques. See also :class:`optim.GradientFreeOptimizer`. .. autosummary:: :toctree: generated :template: class.rst :caption: Other learning agents :nosignatures: GlobOptLearningAgent ---------- Optimizers ---------- .. automodule:: mpcrl.optim .. inheritance-diagram:: mpcrl.optim.base_optimizer.BaseOptimizer GradientFreeOptimizer GradientBasedOptimizer Adam GradientDescent NewtonMethod RMSprop :parts: 1 Base optimizers =============== These are the base abstract optimizer classes that lay the skeleton for the gradient-based updates of the MPC parametrization. We also offer an interface for gradient-free optimizers, which can be used to tune the parameters of the MPC controller via global optimization strategies such as Bayesian Optimization. .. autosummary:: :toctree: generated :template: class.rst :caption: Base optimizers :nosignatures: mpcrl.optim.base_optimizer.BaseOptimizer GradientBasedOptimizer GradientFreeOptimizer Gradient-based optimizers ========================= Here instead are reported the concrete implementations of the gradient-based optimizers that can be used to update the parameters of the MPC controller. They include both first-order and second-order methods, whether they require and make use of gradient and curvature information (i.e., Jacobian and Hessian of some quantity w.r.t. to the parameters). .. autosummary:: :toctree: generated :template: class.rst :caption: Gradient-based optimizers :nosignatures: GradientDescent NewtonMethod Adam RMSprop ---------------- Other submodules ---------------- .. currentmodule:: mpcrl :mod:`mpcrl` offers a few other components that are not explicitly needed by the agents and their core functionalities, but can be useful to enhance the base behaviour of agents via wrappers, or to provide additional methods for, e.g., designing LQR controllers. To this end, we provide a few utility wrapper classes and utility methods in the following submodules. .. autosummary:: :toctree: generated :template: module.rst :caption: Other components wrappers util