Plan improvement by reasoning about unmodeled degrees of freedom of the world
Planning under uncertainty assumes a model that specifies the probabilistic effects of actions in terms of changes of the state. Given such model, planning proceeds to determine a policy that defines the choice of action at each state that maximizes a reward function. In this work, we realize that the world may have degrees of freedom not necessarily captured in the given model, and that the planning solution based thereon may be sub-optimal when compared to those possible when the additional degrees of freedom are considered. We introduce and formalize the problem of planning while considering feasible modifications to the model of the world that would lead to improved policies. We present an approach that models changes in the world as modifications to the probability transition function, and show that the problem of computing the most rewarding feasible modifications can be reduced to a constrained optimization problem. We then contribute a gradient-based algorithm for solving this optimization problem efficiently. We foresee several possible applications of this work, including some in the field of human-robot interaction.
Presented in Partial Fulfillment of the CSD Speaking Skills Requirement