mpcrl.core.exploration.StepWiseExploration#

class mpcrl.core.exploration.StepWiseExploration(base_exploration, step_size, stepwise_decay=True)[source]#

Bases: ExplorationStrategy

Wrapper-like exploration that keeps the wrapped base exploration strategy constants for a number of steps, thus creating a piecewise exploration.

This class takes in another exploration instance, and allows it to change only every N steps, thus yielding a step-wise strategy with steps of the given length. This is useful when, e.g., the exploration strategy must be kept constant across time for a number of steps.

Parameters:
base_explorationExplorationStrategy

The base exploration strategy to be made step-wise.

step_sizeint

Size of each step.

stepwise_decaybool, optional

Enables the decay step to also be step-wise, i.e., applied only every N steps.

Notes

Be carefull that this exploration wrapper ends up modifying the exploration chance and magnitude (if any) of the wrapped base strategy as well as the step behaviour, i.e., the frequency of the decay/increment of the base exploration’s schedulers (again, if any) is enlarged by the step size factor. This is because the number of calls to the base exploration’s step method is reduced by a factor of the step size.

Methods

can_explore()

Computes whether, according to the exploration strategy, the agent should explore or not now, at the current instant.

perturbation(*args, **kwargs)

Returns a random perturbation.

reset([_])

Resets the exploration status, in case it is non-deterministic.

step(*_, **__)

Steps (i.e., decays or increases) any scheduler that this class holds, e.g., exploration's strength and chances.

Attributes

hook

Returns the hook of the base exploration strategy, if any.

mode

Returns the mode of the base exploration strategy.

can_explore()[source]#

Computes whether, according to the exploration strategy, the agent should explore or not now, at the current instant.

Returns:
bool

True if the agent should explore according to this strategy; otherwise, False.

Return type:

bool

property hook: Literal['on_update', 'on_episode_end', 'on_timestep_end'] | None#

Returns the hook of the base exploration strategy, if any.

property mode: Literal['gradient-based', 'additive']#

Returns the mode of the base exploration strategy.

perturbation(*args, **kwargs)[source]#

Returns a random perturbation.

Return type:

ndarray[tuple[Any, ...], dtype[floating]]

reset(_=None)#

Resets the exploration status, in case it is non-deterministic.

Return type:

None

step(*_, **__)[source]#

Steps (i.e., decays or increases) any scheduler that this class holds, e.g., exploration’s strength and chances.

Return type:

None