mpcrl.core.callbacks.LearningAgentCallbackMixin#

class mpcrl.core.callbacks.LearningAgentCallbackMixin[source]#

Bases: AgentCallbackMixin

Class with callbacks for learning agents.

In particular, this class defines, on top of the callbacks from AgentCallbackMixin, the additional following callbacks:

on_update_failure, invoked when an update of the parametrization fails
on_training_start, invoked when training starts (see mpcrl.LearningAgent.train and mpcrl.LearningAgent.train_offpolicy)
on_training_end, invoked when training ends
on_update, invoked after each update of the parametrization.

Methods

`on_env_step`(env, episode, timestep)	Callback called after each call to `gymnasium.Env.step`.
`on_episode_end`(env, episode, rewards)	Callback called at the end of each episode in the training or evaluation process (see `mpcrl.Agent.evaluate`, `mpcrl.LearningAgent.train` and `mpcrl.LearningAgent.train_offpolicy`).
`on_episode_start`(env, episode, state)	Callback called at the beginning of each episode in the training or validation process (see `mpcrl.Agent.evaluate`, `mpcrl.LearningAgent.train` and `mpcrl.LearningAgent.train_offpolicy`).
`on_mpc_failure`(episode, timestep, status, raises)	Callback in case of failure of the MPC solver.
`on_timestep_end`(env, episode, timestep)	Callback called at the end of each time iteration.
`on_training_end`(env, returns)	Callback called at the end of the training process.
`on_training_start`(env)	Callback called at the beginning of the training process.
`on_update`()	Callback called after each `mpcrl.LearningAgent.update`.
`on_update_failure`(episode, timestep, ...)	Callback in case of update failure.
`on_validation_end`(env, returns)	Callback called at the end of the validation process (see `mpcrl.Agent.evaluate`).
`on_validation_start`(env)	Callback called at the beginning of the validation process (see `mpcrl.Agent.evaluate`)

on_env_step(env, episode, timestep)#

Callback called after each call to gymnasium.Env.step.

Parameters:

envgym env: A gym environment where the agent is being trained on.
episodeint: Number of the training episode.
timestepint: Time instant of the current training episode.

Return type:

None

on_episode_end(env, episode, rewards)#

Callback called at the end of each episode in the training or evaluation process (see mpcrl.Agent.evaluate, mpcrl.LearningAgent.train and mpcrl.LearningAgent.train_offpolicy).

Parameters:

envgym env: A gym environment where the agent is being trained on.
episodeint: Number of the training episode.
rewardsfloat: Cumulative rewards for this episode.

Return type:

None

on_episode_start(env, episode, state)#

Callback called at the beginning of each episode in the training or validation process (see mpcrl.Agent.evaluate, mpcrl.LearningAgent.train and mpcrl.LearningAgent.train_offpolicy).

Parameters:

envgym env: A gym environment where the agent is being trained on.
episodeint: Number of the training episode.
stateObsType: Starting state for this episode.

Return type:

None

on_mpc_failure(episode, timestep, status, raises)#

Callback in case of failure of the MPC solver.

Parameters:

episodeint: Number of the episode when the failure happened.
timestepint or None: Timestep of the current episode when the failure happened. Can be None, in case the error occurs inter-episodically or no notion of time step is available.
statusstr: Status of the solver that failed.
raisesbool: Whether the failure should be raised as exception (True) or as a warning (False).

Return type:

None

on_timestep_end(env, episode, timestep)#

Callback called at the end of each time iteration. It is called with the same frequency as on_env_step, but with different timing.

Parameters:

envgym env: A gym environment where the agent is being trained on.
episodeint: Number of the training episode.
timestepint: Time instant of the current training episode.

Return type:

None

on_training_end(env, returns)[source]#

Callback called at the end of the training process.

Parameters:

envgym env: A gym environment where the agent has been trained on.
returnsarray of double: Each episode’s cumulative rewards.

Return type:

None

on_training_start(env)[source]#

Callback called at the beginning of the training process.

Parameters:

envgym env: A gym environment where the agent is being trained on.

Return type:

None

on_update()[source]#

Callback called after each mpcrl.LearningAgent.update.

This callback is especially useful for, e.g., decaying exploration probabilities or learning rates.

Return type:: None

on_update_failure(episode, timestep, errormsg, raises)[source]#

Callback in case of update failure.

Parameters:

episodeint: Number of the episode when the failure happened.
timestepint or None: Timestep of the current episode when the failure happened. Can be None in case the update occurs inter-episodically or no notion of time step is available.
errormsgstr: Error message of the update failure.
raisesbool: Whether the failure should be raised as exception (True) or as a warning (False).

Return type: