Optimal Control via Combined Inference and Numerical Optimization
Problem statement
Numerical optimisation is efficient at refinement but fails to discover control policies or behaviours far from the point of initialisation. Sampling can discover further local optima through exploration with simpler cost functions.
Aim. Combine inference and second-order trajectory optimization so the method can efficiently sample when no derivative information is available and follow optimized trajectories when derivatives arise.
Numerical solution
The method relies on the Bellman equation and its tangent-space solution.
Where \(\mathbf{p}\left(\mathbf{x}'|\mathbf{x},\mathbf{u}\right)\) and \(\mathbf{p}\left(\mathbf{x}'|\mathbf{x}\right)\) are the controlled and uncontrolled transition probabilities.
iLQG distribution is obtained by maximizing \(\ln\exp\left(-\delta Q_t\left(\mathbf{x}_t,\mathbf{u}_t\right)\right)\), with samples from \(\mathcal{C}\) as \(\vec w_t\sim\mathcal{N}\left(\mathbf{u}_{\mathrm{iLQG}_t},Q^{-1}_{\mathbf{uu}_t}\right)\).
Results
Conclusion
Contribution
Natural combination of inference and numerical optimisation.
Inherent ability to explore.
Combine convex and non-convex costs.
Caveats
Search across hyperparameters: \(\lambda\), \(\beta\), and \(k\).
\(Q_{uu}^{-1}\) is not necessarily a good measure of confidence.
Constant control input noise affects convergence.
In the worst case, it can perform as badly as the other methods.