mpcrl.core.exploration.ExplorationStrategy#

class mpcrl.core.exploration.ExplorationStrategy(hook='on_update', mode='gradient-based')[source]#

Bases: ABC

Base abstract class for exploration strategies.

Parameters:
hook{“on_update”, “on_episode_end”, “on_timestep_end”}, optional

Specifies to which callback to hook onto, i.e., when to step the exploration’s schedulers (if any) to, e.g., decay the chances of exploring or the perturbation strength (see step also). The options are

  • "on_update", which steps the exploration after each agent’s update

  • "on_episode_end", which steps the exploration after each episode ends

  • "on_timestep_end", which steps the exploration after each env’s timestep.

By default, "on_update" is selected.

mode{“gradient-based”, “additive”} optional

Mode of application of explorative perturbations to the MPC. If "additive", then the drawn pertubation is added to the optimal action computed by the MPC solver. By default, "gradient-based" is selected, and in this mode the pertubations enter directly in the MPC objective and is multiplied by the first action, thus affecting its gradient.

Methods

can_explore()

Computes whether, according to the exploration strategy, the agent should explore or not now, at the current instant.

perturbation(*args, **kwargs)

Returns a random perturbation.

reset([_])

Resets the exploration status, in case it is non-deterministic.

step(*args, **kwargs)

Steps (i.e., decays or increases) any scheduler that this class holds, e.g., exploration's strength and chances.

Attributes

hook

Gets which callback the exploration is hooked on, i.e., when to step the exploration's schedulers (if any) to, e.g., decay the chances of exploring or the perturbation strength (see step also).

mode

Gets the mode of application of explorative perturbations to the MPC.

abstractmethod can_explore()[source]#

Computes whether, according to the exploration strategy, the agent should explore or not now, at the current instant.

Returns:
bool

True if the agent should explore according to this strategy; otherwise, False.

Return type:

bool

property hook: Literal['on_update', 'on_episode_end', 'on_timestep_end'] | None#

Gets which callback the exploration is hooked on, i.e., when to step the exploration’s schedulers (if any) to, e.g., decay the chances of exploring or the perturbation strength (see step also). Can be None in case no hook is needed.

property mode: Literal['gradient-based', 'additive']#

Gets the mode of application of explorative perturbations to the MPC.

abstractmethod perturbation(*args, **kwargs)[source]#

Returns a random perturbation.

Return type:

ndarray[tuple[Any, ...], dtype[floating]]

reset(_=None)[source]#

Resets the exploration status, in case it is non-deterministic.

Return type:

None

abstractmethod step(*args, **kwargs)[source]#

Steps (i.e., decays or increases) any scheduler that this class holds, e.g., exploration’s strength and chances.

Return type:

None

Examples using mpcrl.core.exploration.ExplorationStrategy#

On-policy Deterministic Policy Gradient

On-policy Deterministic Policy Gradient