Owing to the curse of dimensionality, numerically computing global policies to optimal control problems for complex dynamical systems quickly becomes intractable. In consequence, a number of approximation methods have been developed. However, none of the current methods can quantify by how much the resulting control underperforms the elusive globally optimal solution. We propose Policy Decomposition, an approximation method with explicit suboptimality estimates. Policy decomposition proposes strategies to decompose the optimal control problem into lower-dimensional subproblems, whose optimal solutions are recombined to build a control policy for the entire system.
We introduce the value error and its LQR and DDP estimates to predict the suboptimality of policies obtained through such decompositions. Using a cart-pole, a simplified biped and a 3-link planar manipulators as example systems, we find that the estimates correctly identify the best decomposition, yielding control policies in a fraction of the time it takes to compute the optimal control without a notable sacrifice in closed-loop performance. A consequence of this approach to decompose the original optimal control problem, is having to evaluate combinatorial possibilities of decompositions to identify the promising ones. Therefore, we investigate the use of search methods such as Genetic Algorithm and Monte-Carlo Tree Search to prune through the possibilities. Our results suggest that Policy Decomposition could be an effective alternative for approximating optimal control policies in complex systems.
Hartmut Geyer (Chair)
Zoom Participation. See announcement.