mpcrl.UpdateStrategy#

class mpcrl.UpdateStrategy(frequency, hook='on_timestep_end', skip_first=0)[source]#

Bases: object

A class holding information on the update strategy to be used by the learning algorithm.

Parameters:
frequencyint

Frequency at which, each time the hook is called, an update should be carried out.

skip_firstint, optional

Skips the first skip_first updates. By default 0, so no update is skipped. This is useful when, e.g., the agent has to wait for the experience buffer to be filled before starting to update.

hook{“on_episode_end”, “on_timestep_end”}, optional

Specifies to which callback to hook, i.e., when to check if an update is due according to the given frequency. The options are:

  • "on_episode_end" checks if an update is due after each episode ends

  • "on_timestep_end" checks for an update after each simulation’s time step.

By default, "on_timestep_end" is selected.

Methods

can_update()

Returns whether an update must be carried out now, at the current instant, according to the specified strategy.

can_update()[source]#

Returns whether an update must be carried out now, at the current instant, according to the specified strategy.

Returns:
bool

True if the agent should update according to this strategy; otherwise, False.

Return type:

bool

Notes

This methods steps the internal iterators to check whether an update is due with next. This means that calling this method has a side effect on the state of these iterators, and calling immediately again can result in a different outcome.

Examples using mpcrl.UpdateStrategy#

On-policy Deterministic Policy Gradient

On-policy Deterministic Policy Gradient