mpcrl.core.callbacks.AgentCallbackMixin#

class mpcrl.core.callbacks.AgentCallbackMixin[source]#

Bases: CallbackMixin

Class with callbacks for agents.

In particular, this class defines the following callbacks:

Methods

on_env_step(env, episode, timestep)

Callback called after each call to gymnasium.Env.step.

on_episode_end(env, episode, rewards)

Callback called at the end of each episode in the training or evaluation process (see mpcrl.Agent.evaluate, mpcrl.LearningAgent.train and mpcrl.LearningAgent.train_offpolicy).

on_episode_start(env, episode, state)

Callback called at the beginning of each episode in the training or validation process (see mpcrl.Agent.evaluate, mpcrl.LearningAgent.train and mpcrl.LearningAgent.train_offpolicy).

on_mpc_failure(episode, timestep, status, raises)

Callback in case of failure of the MPC solver.

on_timestep_end(env, episode, timestep)

Callback called at the end of each time iteration.

on_validation_end(env, returns)

Callback called at the end of the validation process (see mpcrl.Agent.evaluate).

on_validation_start(env)

Callback called at the beginning of the validation process (see mpcrl.Agent.evaluate)

on_env_step(env, episode, timestep)[source]#

Callback called after each call to gymnasium.Env.step.

Parameters:
envgym env

A gym environment where the agent is being trained on.

episodeint

Number of the training episode.

timestepint

Time instant of the current training episode.

Return type:

None

on_episode_end(env, episode, rewards)[source]#

Callback called at the end of each episode in the training or evaluation process (see mpcrl.Agent.evaluate, mpcrl.LearningAgent.train and mpcrl.LearningAgent.train_offpolicy).

Parameters:
envgym env

A gym environment where the agent is being trained on.

episodeint

Number of the training episode.

rewardsfloat

Cumulative rewards for this episode.

Return type:

None

on_episode_start(env, episode, state)[source]#

Callback called at the beginning of each episode in the training or validation process (see mpcrl.Agent.evaluate, mpcrl.LearningAgent.train and mpcrl.LearningAgent.train_offpolicy).

Parameters:
envgym env

A gym environment where the agent is being trained on.

episodeint

Number of the training episode.

stateObsType

Starting state for this episode.

Return type:

None

on_mpc_failure(episode, timestep, status, raises)[source]#

Callback in case of failure of the MPC solver.

Parameters:
episodeint

Number of the episode when the failure happened.

timestepint or None

Timestep of the current episode when the failure happened. Can be None, in case the error occurs inter-episodically or no notion of time step is available.

statusstr

Status of the solver that failed.

raisesbool

Whether the failure should be raised as exception (True) or as a warning (False).

Return type:

None

on_timestep_end(env, episode, timestep)[source]#

Callback called at the end of each time iteration. It is called with the same frequency as on_env_step, but with different timing.

Parameters:
envgym env

A gym environment where the agent is being trained on.

episodeint

Number of the training episode.

timestepint

Time instant of the current training episode.

Return type:

None

on_validation_end(env, returns)[source]#

Callback called at the end of the validation process (see mpcrl.Agent.evaluate).

Parameters:
envgym env

A gym environment where the agent has been validated on.

returnsarray of double

Each episode’s cumulative rewards.

Return type:

None

on_validation_start(env)[source]#

Callback called at the beginning of the validation process (see mpcrl.Agent.evaluate)

Parameters:
envgym env

A gym environment where the agent is being validated on.

Return type:

None

Examples using mpcrl.core.callbacks.AgentCallbackMixin#

Off-policy Q-learning

Off-policy Q-learning

On-policy Deterministic Policy Gradient

On-policy Deterministic Policy Gradient

On-policy Q-learning

On-policy Q-learning