mpcrl.core.exploration.StepWiseExploration#
- class mpcrl.core.exploration.StepWiseExploration(base_exploration, step_size, stepwise_decay=True)[source]#
Bases:
ExplorationStrategyWrapper-like exploration that keeps the wrapped base exploration strategy constants for a number of steps, thus creating a piecewise exploration.
This class takes in another exploration instance, and allows it to change only every
Nsteps, thus yielding a step-wise strategy with steps of the given length. This is useful when, e.g., the exploration strategy must be kept constant across time for a number of steps.- Parameters:
- base_explorationExplorationStrategy
The base exploration strategy to be made step-wise.
- step_sizeint
Size of each step.
- stepwise_decaybool, optional
Enables the decay
stepto also be step-wise, i.e., applied only everyNsteps.
Notes
Be carefull that this exploration wrapper ends up modifying the exploration chance and magnitude (if any) of the wrapped base strategy as well as the step behaviour, i.e., the frequency of the decay/increment of the base exploration’s schedulers (again, if any) is enlarged by the step size factor. This is because the number of calls to the base exploration’s
stepmethod is reduced by a factor of the step size.Methods
Computes whether, according to the exploration strategy, the agent should explore or not now, at the current instant.
perturbation(*args, **kwargs)Returns a random perturbation.
reset([_])Resets the exploration status, in case it is non-deterministic.
step(*_, **__)Steps (i.e., decays or increases) any scheduler that this class holds, e.g., exploration's strength and chances.
Attributes
Returns the hook of the base exploration strategy, if any.
Returns the mode of the base exploration strategy.
- can_explore()[source]#
Computes whether, according to the exploration strategy, the agent should explore or not now, at the current instant.
- Returns:
- bool
Trueif the agent should explore according to this strategy; otherwise,False.
- Return type:
- property hook: Literal['on_update', 'on_episode_end', 'on_timestep_end'] | None#
Returns the hook of the base exploration strategy, if any.
- property mode: Literal['gradient-based', 'additive']#
Returns the mode of the base exploration strategy.