mpcrl.util.control.dlqr#

mpcrl.util.control.dlqr(A, B, Q, R, M=None)[source]#

Computes the solution to the discrete-time LQR problem.

The LQR problem is to solve the following optimization problem

\[\min_{u} \sum_{t=0}^{\infty} x_t^\top Q x_t + u_t^\top R u_t + 2 x_t^\top M u_t\]

for the linear time-invariant discrete-time system

\[x_{t+1} = A x_t + B u_t.\]

The (famous) solution takes the form of a state feedback law

\[u_t = -K x_t\]

with a quadratic cost-to-go function

\[V(x_t) = x_t^\top P x_t.\]

The function returns the optimal state feedback matrix \(K\) and the quadratic terminal cost-to-go matrix \(P\). If not provided, M is assumed to be zero.

Parameters:
Aarray

State matrix.

Barray

Control input matrix.

Qarray

State weighting matrix.

Rarray

Control input weighting matrix.

Marray, optional

Mixed state-input weighting matrix, by default None.

Returns:
tuple of two arrays

Returns the optimal state feedback matrix \(K\) and the quadratic terminal cost-to-go matrix \(P\).

Return type:

tuple[ndarray[tuple[Any, ...], dtype[floating]], ndarray[tuple[Any, ...], dtype[floating]]]

Examples using mpcrl.util.control.dlqr#

Off-policy Q-learning

Off-policy Q-learning

On-policy Q-learning

On-policy Q-learning

On-policy Deterministic Policy Gradient

On-policy Deterministic Policy Gradient