mpcrl.core.exploration.EpsilonGreedyExploration#

class mpcrl.core.exploration.EpsilonGreedyExploration(epsilon, strength, hook='on_update', mode='gradient-based', seed=None)[source]#

Bases: GreedyExploration

Epsilon-greedy strategy for perturbing the policy, which only occasionally perturbs randomly the MPC policy.

Parameters:
epsilonscheduler or float

The probability to explore. Should be in range [0, 1]. If passed in the form of an mpcrl.schedulers.Scheduler, then the probability can be scheduled to decay or increase every time step is called. Otherwise, it is kept constant.

strengthscheduler or array/supports-algebraic-operations

The strength of the exploration. If passed in the form of an mpcrl.schedulers.Scheduler, then the strength can be scheduled to decay or increase every time step is called. Otherwise, it is kept constant.

hook{“on_update”, “on_episode_end”, “on_timestep_end”}, optional

Specifies to which callback to hook onto, i.e., when to step the exploration’s schedulers (if any) to, e.g., decay the chances of exploring or the perturbation strength (see step also). The options are

  • "on_update", which steps the exploration after each agent’s update

  • "on_episode_end", which steps the exploration after each episode ends

  • "on_timestep_end", which steps the exploration after each env’s timestep.

By default, "on_update" is selected.

mode{“gradient-based”, “additive”} optional

Mode of application of explorative perturbations to the MPC. If "additive", then the drawn pertubation is added to the optimal action computed by the MPC solver. By default, "gradient-based" is selected, and in this mode the pertubations enter directly in the MPC objective and is multiplied by the first action, thus affecting its gradient.

seedNone, int, array_like of ints, SeedSequence, BitGenerator, Generator

Number to seed the numpy.random.Generator used for randomizing the exploration. By default, None.

Methods

can_explore()

Computes whether, according to the exploration strategy, the agent should explore or not now, at the current instant.

perturbation(method, *args, **kwargs)

Returns a random perturbation.

reset([seed])

Resets the exploration status, in case it is non-deterministic.

step(*_, **__)

Steps (i.e., decays or increases) the exploration strength and probability according to their schedulers.

Attributes

hook

Gets which callback the exploration is hooked on, i.e., when to step the exploration's schedulers (if any) to, e.g., decay the chances of exploring or the perturbation strength (see step also).

mode

Gets the mode of application of explorative perturbations to the MPC.

can_explore()[source]#

Computes whether, according to the exploration strategy, the agent should explore or not now, at the current instant.

Returns:
bool

True if the agent should explore according to this strategy; otherwise, False.

Return type:

bool

property hook: Literal['on_update', 'on_episode_end', 'on_timestep_end'] | None#

Gets which callback the exploration is hooked on, i.e., when to step the exploration’s schedulers (if any) to, e.g., decay the chances of exploring or the perturbation strength (see step also). Can be None in case no hook is needed.

property mode: Literal['gradient-based', 'additive']#

Gets the mode of application of explorative perturbations to the MPC.

perturbation(method, *args, **kwargs)#

Returns a random perturbation.

Parameters:
methodstr

The name of a method from the ones available to numpy.random.Generator, e.g., "random" for numpy.random.Generator.random, "normal" for numpy.random.Generator.random, etc.

args, kwargs

Args and kwargs with which to call such method.

Returns:
array

An array representing the perturbation.

Return type:

ndarray[tuple[Any, ...], dtype[floating]]

reset(seed=None)#

Resets the exploration status, in case it is non-deterministic.

Return type:

None

step(*_, **__)[source]#

Steps (i.e., decays or increases) the exploration strength and probability according to their schedulers.

Return type:

None