mpcrl.core.exploration.ExplorationStrategy#
- class mpcrl.core.exploration.ExplorationStrategy(hook='on_update', mode='gradient-based')[source]#
Bases:
ABCBase abstract class for exploration strategies.
- Parameters:
- hook{“on_update”, “on_episode_end”, “on_timestep_end”}, optional
Specifies to which callback to hook onto, i.e., when to step the exploration’s schedulers (if any) to, e.g., decay the chances of exploring or the perturbation strength (see
stepalso). The options are"on_update", which steps the exploration after each agent’s update"on_episode_end", which steps the exploration after each episode ends"on_timestep_end", which steps the exploration after each env’s timestep.
By default,
"on_update"is selected.- mode{“gradient-based”, “additive”} optional
Mode of application of explorative perturbations to the MPC. If
"additive", then the drawn pertubation is added to the optimal action computed by the MPC solver. By default,"gradient-based"is selected, and in this mode the pertubations enter directly in the MPC objective and is multiplied by the first action, thus affecting its gradient.
Methods
Computes whether, according to the exploration strategy, the agent should explore or not now, at the current instant.
perturbation(*args, **kwargs)Returns a random perturbation.
reset([_])Resets the exploration status, in case it is non-deterministic.
step(*args, **kwargs)Steps (i.e., decays or increases) any scheduler that this class holds, e.g., exploration's strength and chances.
Attributes
Gets which callback the exploration is hooked on, i.e., when to step the exploration's schedulers (if any) to, e.g., decay the chances of exploring or the perturbation strength (see
stepalso).Gets the mode of application of explorative perturbations to the MPC.
- abstractmethod can_explore()[source]#
Computes whether, according to the exploration strategy, the agent should explore or not now, at the current instant.
- Returns:
- bool
Trueif the agent should explore according to this strategy; otherwise,False.
- Return type:
- property hook: Literal['on_update', 'on_episode_end', 'on_timestep_end'] | None#
Gets which callback the exploration is hooked on, i.e., when to step the exploration’s schedulers (if any) to, e.g., decay the chances of exploring or the perturbation strength (see
stepalso). Can beNonein case no hook is needed.
- property mode: Literal['gradient-based', 'additive']#
Gets the mode of application of explorative perturbations to the MPC.