mpcrl.core.callbacks.LearningAgentCallbackMixin#

class mpcrl.core.callbacks.LearningAgentCallbackMixin[source]#

Bases: AgentCallbackMixin

Class with callbacks for learning agents.

In particular, this class defines, on top of the callbacks from AgentCallbackMixin, the additional following callbacks:

Methods

on_env_step(env, episode, timestep)

Callback called after each call to gymnasium.Env.step.

on_episode_end(env, episode, rewards)

Callback called at the end of each episode in the training or evaluation process (see mpcrl.Agent.evaluate, mpcrl.LearningAgent.train and mpcrl.LearningAgent.train_offpolicy).

on_episode_start(env, episode, state)

Callback called at the beginning of each episode in the training or validation process (see mpcrl.Agent.evaluate, mpcrl.LearningAgent.train and mpcrl.LearningAgent.train_offpolicy).

on_mpc_failure(episode, timestep, status, raises)

Callback in case of failure of the MPC solver.

on_timestep_end(env, episode, timestep)

Callback called at the end of each time iteration.

on_training_end(env, returns)

Callback called at the end of the training process.

on_training_start(env)

Callback called at the beginning of the training process.

on_update()

Callback called after each mpcrl.LearningAgent.update.

on_update_failure(episode, timestep, ...)

Callback in case of update failure.

on_validation_end(env, returns)

Callback called at the end of the validation process (see mpcrl.Agent.evaluate).

on_validation_start(env)

Callback called at the beginning of the validation process (see mpcrl.Agent.evaluate)

on_env_step(env, episode, timestep)#

Callback called after each call to gymnasium.Env.step.

Parameters:
envgym env

A gym environment where the agent is being trained on.

episodeint

Number of the training episode.

timestepint

Time instant of the current training episode.

Return type:

None

on_episode_end(env, episode, rewards)#

Callback called at the end of each episode in the training or evaluation process (see mpcrl.Agent.evaluate, mpcrl.LearningAgent.train and mpcrl.LearningAgent.train_offpolicy).

Parameters:
envgym env

A gym environment where the agent is being trained on.

episodeint

Number of the training episode.

rewardsfloat

Cumulative rewards for this episode.

Return type:

None

on_episode_start(env, episode, state)#

Callback called at the beginning of each episode in the training or validation process (see mpcrl.Agent.evaluate, mpcrl.LearningAgent.train and mpcrl.LearningAgent.train_offpolicy).

Parameters:
envgym env

A gym environment where the agent is being trained on.

episodeint

Number of the training episode.

stateObsType

Starting state for this episode.

Return type:

None

on_mpc_failure(episode, timestep, status, raises)#

Callback in case of failure of the MPC solver.

Parameters:
episodeint

Number of the episode when the failure happened.

timestepint or None

Timestep of the current episode when the failure happened. Can be None, in case the error occurs inter-episodically or no notion of time step is available.

statusstr

Status of the solver that failed.

raisesbool

Whether the failure should be raised as exception (True) or as a warning (False).

Return type:

None

on_timestep_end(env, episode, timestep)#

Callback called at the end of each time iteration. It is called with the same frequency as on_env_step, but with different timing.

Parameters:
envgym env

A gym environment where the agent is being trained on.

episodeint

Number of the training episode.

timestepint

Time instant of the current training episode.

Return type:

None

on_training_end(env, returns)[source]#

Callback called at the end of the training process.

Parameters:
envgym env

A gym environment where the agent has been trained on.

returnsarray of double

Each episode’s cumulative rewards.

Return type:

None

on_training_start(env)[source]#

Callback called at the beginning of the training process.

Parameters:
envgym env

A gym environment where the agent is being trained on.

Return type:

None

on_update()[source]#

Callback called after each mpcrl.LearningAgent.update.

This callback is especially useful for, e.g., decaying exploration probabilities or learning rates.

Return type:

None

on_update_failure(episode, timestep, errormsg, raises)[source]#

Callback in case of update failure.

Parameters:
episodeint

Number of the episode when the failure happened.

timestepint or None

Timestep of the current episode when the failure happened. Can be None in case the update occurs inter-episodically or no notion of time step is available.

errormsgstr

Error message of the update failure.

raisesbool

Whether the failure should be raised as exception (True) or as a warning (False).

Return type:

None

on_validation_end(env, returns)#

Callback called at the end of the validation process (see mpcrl.Agent.evaluate).

Parameters:
envgym env

A gym environment where the agent has been validated on.

returnsarray of double

Each episode’s cumulative rewards.

Return type:

None

on_validation_start(env)#

Callback called at the beginning of the validation process (see mpcrl.Agent.evaluate)

Parameters:
envgym env

A gym environment where the agent is being validated on.

Return type:

None

Examples using mpcrl.core.callbacks.LearningAgentCallbackMixin#

Off-policy Q-learning

Off-policy Q-learning

On-policy Deterministic Policy Gradient

On-policy Deterministic Policy Gradient

On-policy Q-learning

On-policy Q-learning