mpcrl.optim.GradientDescent#
- class mpcrl.optim.GradientDescent(learning_rate, weight_decay=0.0, momentum=0.0, dampening=0.0, nesterov=False, hook='on_update', max_percentage_update=inf, bound_consistency=False)[source]#
Bases:
GradientBasedOptimizer[LrType]First-order Gradient descent optimizer, based on [14] and
torch.optim.SGD.In its basic formulation, this optimizer updates the parameters as
\[\theta \gets \theta - \alpha g,\]where \(\theta\) are the learnable parameters, \(\alpha\) is the learning rate (could be extended to the case this is a vector of rates), and \(g\) is the gradient of the loss function w.r.t. the parameters. If momentum or weight decay are used, the gradient \(g\) is modified before using it, but the update rule remains the same. However, when considering a constrained parameter space, we need to solve a Quadratic Programming (QP) problem to ensure the parameters stay within their bounds. For gradient descent, the QP problem is
\[\begin{split}\begin{aligned} \min_{\Delta\theta} & \quad \frac{1}{2} \lVert \Delta\theta \rVert_2^2 + \alpha g^\top \Delta\theta \\ \text{s.t.} & \quad \theta_{\text{lower}} \leq \theta + \Delta\theta \leq \theta_{\text{upper}} \end{aligned}\end{split}\]followed by the update \(\theta \gets \theta + \Delta\theta\).
- Parameters:
- learning_ratefloat or array or
mpcrl.core.schedulers.Scheduler The learning rate of the optimizer. It can be:
a float, in case the learning rate must stay constant and is the same for all learnable parameters
an array, in case the learning rate must stay constant but is different for each parameter (should have the same size as the number of learnable parameters)
a
mpcrl.core.schedulers.Scheduler, in case the learning rate can vary during the learning process (usually, it is set to decay). See thehookargument for more details on when this scheduler is stepped.
- weight_decayfloat, optional
A positive float that specifies the decay of the learnable parameters in the form of an L2 regularization term. By default, it is set to
0.0, so no decay/regularization takes place.- momentumfloat, optional
A positive float that specifies the momentum factor. By default, it is set to
0.0, so no momentum is used.- dampeningfloat, optional
A positive float that specifies the dampening factor for the momentum. By default, it is set to
0.0, so no dampening is used.- nesterovbool, optional
A boolean that specifies whether to use Nesterov momentum. By default, it is set to
False.- hook{“on_update”, “on_episode_end”, “on_timestep_end”}, optional
Specifies when to step the optimizer’s learning rate’s scheduler to decay its value. This allows to vary the rate over the learning iterations. The options are:
"on_update"steps the learning rate after each agent’s update"on_episode_end"steps the learning rate after each episode’s end"on_timestep_end"steps the learning rate after each env’s timestep.
By default,
"on_update"is selected.- max_percentage_updatefloat, optional
A positive float that specifies the maximum percentage change the learnable parameters can experience in each update. For example,
max_percentage_update=0.5means that the parameters can be updated by up to 50% of their current value. By default, it is set to+inf. If specified, the update becomes constrained and has to be solved as a QP, which is inevitably slower than its unconstrained counterpart (a linear system).- bound_consistencybool, optional
A boolean that, if
True, forces the learnable parameters to lie in their bounds when updated. This is done vianumpy.clip. Only beneficial if numerical issues arise during updates, e.g., due to the QP solver not being able to guarantee bounds.
- learning_ratefloat or array or
Methods
set_learnable_parameters(pars)Makes the optimization class aware of the dictionary of the learnable parameters whose values are to be updated.
step(*_, **__)Steps/decays the learning rate according to its scheduler.
update(gradient[, hessian])Computes the gradient-based update of the learnable parameters dictated by the current RL algorithm.
Attributes
Gets the hook to which the scheduler is attached to, i.e., when to step the learning rate's scheduler to decay its value.
Gets the order of the optimizer:
1for first-order,2for second-order.- property hook: str | None#
Gets the hook to which the scheduler is attached to, i.e., when to step the learning rate’s scheduler to decay its value.
- Returns:
- optional str
The hook to which the scheduler is attached to. Can be
Nonein case no hook is needed (e.g., a scheduler was not passed aslearning_rate).
- property order: Literal[1, 2]#
Gets the order of the optimizer:
1for first-order,2for second-order.- Returns:
- 1 or 2
The order of the optimizer.
- set_learnable_parameters(pars)#
Makes the optimization class aware of the dictionary of the learnable parameters whose values are to be updated.
- Parameters:
- pars:class`mpcrl.LearnableParametersDict`
The dictionary of the learnable parameters.
- Return type:
- update(gradient, hessian=None)#
Computes the gradient-based update of the learnable parameters dictated by the current RL algorithm.
- Parameters:
- gradient1D array
The gradient of the learnable parameters.
- hessian2D array, optional
The hessian of the learnable parameters. When the optimizer is firt-order, it is expected to be
Nonesince it is unused. When the optimizer is second-order, it is expected to be a 2D array.
- Returns:
- statusstr, optional
An optional string containing the status of the update, e.g., the status of the QP solver, if used.
- Return type: