mpcrl.core.callbacks.LearningAgentCallbackMixin#
- class mpcrl.core.callbacks.LearningAgentCallbackMixin[source]#
Bases:
AgentCallbackMixinClass with callbacks for learning agents.
In particular, this class defines, on top of the callbacks from
AgentCallbackMixin, the additional following callbacks:on_update_failure, invoked when an update of the parametrization failson_training_start, invoked when training starts (seempcrl.LearningAgent.trainandmpcrl.LearningAgent.train_offpolicy)on_training_end, invoked when training endson_update, invoked after each update of the parametrization.
Methods
on_env_step(env, episode, timestep)Callback called after each call to
gymnasium.Env.step.on_episode_end(env, episode, rewards)Callback called at the end of each episode in the training or evaluation process (see
mpcrl.Agent.evaluate,mpcrl.LearningAgent.trainandmpcrl.LearningAgent.train_offpolicy).on_episode_start(env, episode, state)Callback called at the beginning of each episode in the training or validation process (see
mpcrl.Agent.evaluate,mpcrl.LearningAgent.trainandmpcrl.LearningAgent.train_offpolicy).on_mpc_failure(episode, timestep, status, raises)Callback in case of failure of the MPC solver.
on_timestep_end(env, episode, timestep)Callback called at the end of each time iteration.
on_training_end(env, returns)Callback called at the end of the training process.
on_training_start(env)Callback called at the beginning of the training process.
Callback called after each
mpcrl.LearningAgent.update.on_update_failure(episode, timestep, ...)Callback in case of update failure.
on_validation_end(env, returns)Callback called at the end of the validation process (see
mpcrl.Agent.evaluate).on_validation_start(env)Callback called at the beginning of the validation process (see
mpcrl.Agent.evaluate)- on_env_step(env, episode, timestep)#
Callback called after each call to
gymnasium.Env.step.- Parameters:
- envgym env
A gym environment where the agent is being trained on.
- episodeint
Number of the training episode.
- timestepint
Time instant of the current training episode.
- Return type:
- on_episode_end(env, episode, rewards)#
Callback called at the end of each episode in the training or evaluation process (see
mpcrl.Agent.evaluate,mpcrl.LearningAgent.trainandmpcrl.LearningAgent.train_offpolicy).- Parameters:
- envgym env
A gym environment where the agent is being trained on.
- episodeint
Number of the training episode.
- rewardsfloat
Cumulative rewards for this episode.
- Return type:
- on_episode_start(env, episode, state)#
Callback called at the beginning of each episode in the training or validation process (see
mpcrl.Agent.evaluate,mpcrl.LearningAgent.trainandmpcrl.LearningAgent.train_offpolicy).- Parameters:
- envgym env
A gym environment where the agent is being trained on.
- episodeint
Number of the training episode.
- stateObsType
Starting state for this episode.
- Return type:
- on_mpc_failure(episode, timestep, status, raises)#
Callback in case of failure of the MPC solver.
- Parameters:
- episodeint
Number of the episode when the failure happened.
- timestepint or None
Timestep of the current episode when the failure happened. Can be
None, in case the error occurs inter-episodically or no notion of time step is available.- statusstr
Status of the solver that failed.
- raisesbool
Whether the failure should be raised as exception (
True) or as a warning (False).
- Return type:
- on_timestep_end(env, episode, timestep)#
Callback called at the end of each time iteration. It is called with the same frequency as
on_env_step, but with different timing.- Parameters:
- envgym env
A gym environment where the agent is being trained on.
- episodeint
Number of the training episode.
- timestepint
Time instant of the current training episode.
- Return type:
- on_training_end(env, returns)[source]#
Callback called at the end of the training process.
- Parameters:
- envgym env
A gym environment where the agent has been trained on.
- returnsarray of double
Each episode’s cumulative rewards.
- Return type:
- on_training_start(env)[source]#
Callback called at the beginning of the training process.
- Parameters:
- envgym env
A gym environment where the agent is being trained on.
- Return type:
- on_update()[source]#
Callback called after each
mpcrl.LearningAgent.update.This callback is especially useful for, e.g., decaying exploration probabilities or learning rates.
- Return type:
- on_update_failure(episode, timestep, errormsg, raises)[source]#
Callback in case of update failure.
- Parameters:
- episodeint
Number of the episode when the failure happened.
- timestepint or None
Timestep of the current episode when the failure happened. Can be
Nonein case the update occurs inter-episodically or no notion of time step is available.- errormsgstr
Error message of the update failure.
- raisesbool
Whether the failure should be raised as exception (
True) or as a warning (False).
- Return type:
- on_validation_end(env, returns)#
Callback called at the end of the validation process (see
mpcrl.Agent.evaluate).- Parameters:
- envgym env
A gym environment where the agent has been validated on.
- returnsarray of double
Each episode’s cumulative rewards.
- Return type:
- on_validation_start(env)#
Callback called at the beginning of the validation process (see
mpcrl.Agent.evaluate)- Parameters:
- envgym env
A gym environment where the agent is being validated on.
- Return type: