`mpcrl.util.control`.dlqr#

mpcrl.util.control.dlqr(A, B, Q, R, M=None)[source]#

Computes the solution to the discrete-time LQR problem.

The LQR problem is to solve the following optimization problem

\[\min_{u} \sum_{t=0}^{\infty} x_t^\top Q x_t + u_t^\top R u_t + 2 x_t^\top M u_t\]

for the linear time-invariant discrete-time system

\[x_{t+1} = A x_t + B u_t.\]

The (famous) solution takes the form of a state feedback law

\[u_t = -K x_t\]

with a quadratic cost-to-go function

\[V(x_t) = x_t^\top P x_t.\]

The function returns the optimal state feedback matrix \(K\) and the quadratic terminal cost-to-go matrix \(P\). If not provided, M is assumed to be zero.

Parameters:

Aarray: State matrix.
Barray: Control input matrix.
Qarray: State weighting matrix.
Rarray: Control input weighting matrix.
Marray, optional: Mixed state-input weighting matrix, by default None.

Returns:

tuple of two arrays: Returns the optimal state feedback matrix \(K\) and the quadratic terminal cost-to-go matrix \(P\).

Return type:: tuple[ndarray[tuple[Any, ...], dtype[floating]], ndarray[tuple[Any, ...], dtype[floating]]]

Examples using `mpcrl.util.control.dlqr`#

Off-policy Q-learning

On-policy Q-learning

On-policy Deterministic Policy Gradient

mpcrl.util.control.dlqr#

Examples using mpcrl.util.control.dlqr#

This Page

`mpcrl.util.control`.dlqr#

Examples using `mpcrl.util.control.dlqr`#