mpcrl.core.exploration.EpsilonGreedyExploration#
- class mpcrl.core.exploration.EpsilonGreedyExploration(epsilon, strength, hook='on_update', mode='gradient-based', seed=None)[source]#
Bases:
GreedyExplorationEpsilon-greedy strategy for perturbing the policy, which only occasionally perturbs randomly the MPC policy.
- Parameters:
- epsilonscheduler or float
The probability to explore. Should be in range
[0, 1]. If passed in the form of anmpcrl.schedulers.Scheduler, then the probability can be scheduled to decay or increase every timestepis called. Otherwise, it is kept constant.- strengthscheduler or array/supports-algebraic-operations
The strength of the exploration. If passed in the form of an
mpcrl.schedulers.Scheduler, then the strength can be scheduled to decay or increase every timestepis called. Otherwise, it is kept constant.- hook{“on_update”, “on_episode_end”, “on_timestep_end”}, optional
Specifies to which callback to hook onto, i.e., when to step the exploration’s schedulers (if any) to, e.g., decay the chances of exploring or the perturbation strength (see
stepalso). The options are"on_update", which steps the exploration after each agent’s update"on_episode_end", which steps the exploration after each episode ends"on_timestep_end", which steps the exploration after each env’s timestep.
By default,
"on_update"is selected.- mode{“gradient-based”, “additive”} optional
Mode of application of explorative perturbations to the MPC. If
"additive", then the drawn pertubation is added to the optimal action computed by the MPC solver. By default,"gradient-based"is selected, and in this mode the pertubations enter directly in the MPC objective and is multiplied by the first action, thus affecting its gradient.- seedNone, int, array_like of ints, SeedSequence, BitGenerator, Generator
Number to seed the
numpy.random.Generatorused for randomizing the exploration. By default,None.
Methods
Computes whether, according to the exploration strategy, the agent should explore or not now, at the current instant.
perturbation(method, *args, **kwargs)Returns a random perturbation.
reset([seed])Resets the exploration status, in case it is non-deterministic.
step(*_, **__)Steps (i.e., decays or increases) the exploration strength and probability according to their schedulers.
Attributes
Gets which callback the exploration is hooked on, i.e., when to step the exploration's schedulers (if any) to, e.g., decay the chances of exploring or the perturbation strength (see
stepalso).Gets the mode of application of explorative perturbations to the MPC.
- can_explore()[source]#
Computes whether, according to the exploration strategy, the agent should explore or not now, at the current instant.
- Returns:
- bool
Trueif the agent should explore according to this strategy; otherwise,False.
- Return type:
- property hook: Literal['on_update', 'on_episode_end', 'on_timestep_end'] | None#
Gets which callback the exploration is hooked on, i.e., when to step the exploration’s schedulers (if any) to, e.g., decay the chances of exploring or the perturbation strength (see
stepalso). Can beNonein case no hook is needed.
- property mode: Literal['gradient-based', 'additive']#
Gets the mode of application of explorative perturbations to the MPC.
- perturbation(method, *args, **kwargs)#
Returns a random perturbation.
- Parameters:
- methodstr
The name of a method from the ones available to
numpy.random.Generator, e.g.,"random"fornumpy.random.Generator.random,"normal"fornumpy.random.Generator.random, etc.- args, kwargs
Args and kwargs with which to call such method.
- Returns:
- array
An array representing the perturbation.
- Return type: