mpcrl.core.exploration.OrnsteinUhlenbeckExploration#
- class mpcrl.core.exploration.OrnsteinUhlenbeckExploration(mean, sigma, theta=0.15, dt=1.0, initial_noise=None, hook='on_update', mode='gradient-based', seed=None)[source]#
Bases:
ExplorationStrategyExploration based on the Ornstein-Uhlenbeck Brownian motion with friction.
Inspired by
stable_baselines3.common.noise.OrnsteinUhlenbeckActionNoise.- Parameters:
- meanscheduler or array/supports-algebraic-operations
Mean of the stochastic process. Should have the same shape as the action.
- sigmascheduler or array/supports-algebraic-operations
Standard deviation of the stochastic process. Should have the same shape as the action.
- thetafloat, optional
Coefficient of attraction of the process towards mean, by default
0.15.- dtfloat, optional
Time step of the process, by default
1.0.- initial_noisearray-like, optional
A default initial noise. By default
None, in which case it is set to zero.- hook{“on_update”, “on_episode_end”, “on_timestep_end”}, optional
Specifies to which callback to hook onto, i.e., when to step the exploration’s schedulers (if any) to, e.g., decay the chances of exploring or the perturbation strength (see
stepalso). The options are"on_update", which steps the exploration after each agent’s update"on_episode_end", which steps the exploration after each episode ends"on_timestep_end", which steps the exploration after each env’s timestep.
By default,
"on_update"is selected.- mode{“gradient-based”, “additive”} optional
Mode of application of explorative perturbations to the MPC. If
"additive", then the drawn pertubation is added to the optimal action computed by the MPC solver. By default,"gradient-based"is selected, and in this mode the pertubations enter directly in the MPC objective and is multiplied by the first action, thus affecting its gradient.- seedNone, int, array_like of ints, SeedSequence, BitGenerator, Generator
Number to seed the
numpy.random.Generatorused for randomizing the exploration. By default,None.
Methods
Computes whether, according to the exploration strategy, the agent should explore or not now, at the current instant.
perturbation(*_, size, **__)Returns a random perturbation.
reset([seed])Resets the exploration status, in case it is non-deterministic.
step(*_, **__)Updates (i.e., decays or increases) the mean and standard deviation of the perturbation according to their schedulers.
Attributes
Gets which callback the exploration is hooked on, i.e., when to step the exploration's schedulers (if any) to, e.g., decay the chances of exploring or the perturbation strength (see
stepalso).Gets the mode of application of explorative perturbations to the MPC.
- can_explore()[source]#
Computes whether, according to the exploration strategy, the agent should explore or not now, at the current instant.
- Returns:
- bool
Trueif the agent should explore according to this strategy; otherwise,False.
- Return type:
- property hook: Literal['on_update', 'on_episode_end', 'on_timestep_end'] | None#
Gets which callback the exploration is hooked on, i.e., when to step the exploration’s schedulers (if any) to, e.g., decay the chances of exploring or the perturbation strength (see
stepalso). Can beNonein case no hook is needed.
- property mode: Literal['gradient-based', 'additive']#
Gets the mode of application of explorative perturbations to the MPC.