Offline POMDP Planning

 POMDPs provide a principled mathematical framework for planning and decision making under uncertainty. However, POMDPs are often avoided in practice, because solving POMDPs exactly is computationally intractable. Not long ago, the best algorithms could spend hours computing exact solutions to POMDPs with only a dozen states. In recent years, point-based algorithms have made impressive progress in computing approximately optimal solutions for large POMDPs: POMDPs with hundreds of states have been solved in a matter of seconds. This project consists of two main thrusts: understand why the point-based algorithms work well, whether there are sub-classes of POMDPs that are computationally easier, and what complexity measures are suitable for capturing the difficulty of POMDP planning for point-based algorithms, and develop efficient point-based POMDP algorithms by exploting the insights from the theoretical analysis. Software Dec 15, 2009 APPL release 0.91 is available for download here. Feb 15, 2009 APPL release 0.3 is available for download here. Jul 17, 2008 APPL release 0.2 is available for download here. Apr 15, 2008 APPL, our software package for solving POMDPs, is available for download here.

The Covering Number and The Complexity of POMDP Planning

Intuitively, the computational intractability is due to the “curse of dimensionality”: the belief space B used in solving a POMDP typically has dimensionality equal to |S|, the number of states in the POMDP, and therefore the size of B grows exponentially with |S|. As a result, the number of states is often used as an important complexity measure for POMDP planning. However, if the number of states was an accurate measure of the “true” complexity of a POMDP, it would seem surprising that point-based algorithms can obtain (approximate) solutions in seconds in belief spaces of hundreds of dimensions.

We propose to use the covering number as an alternative measure of the complexity of POMDP planning. As expected, not all belief spaces have the same complexity even if they have the same dimenionality. Consider the example below. Although B and B′ are both two-dimensional, it is intuitively clear that the size of B′, which consists only three points, is much smaller than of that B.

The covering number captures several interesting structural properties that drastically reduce the size of the belief space, such as sparsity and factorization. Using the idea of the covering number, we have identified several conditions under which approximate POMDP solutions can be computed efficiently.

SARSOP

One key idea of point-based POMDP algorithms is to sample a set of points from a belief space Band use it as an approximate representation of B, instead of representing B exactly. Some early POMDP algorithms sample the entire belief space B, using a uniform sampling distribution, such as a grid. However, it is difficult to sample a representative set of points from B, due to its large size. More recent point-based algorithms sample only R, the subset of belief points reachable from a given initial point b0 in B, under arbitrary sequences of actions. It is generally believed that R is much smaller than B. Indeed, focusing on R allows point-based algorithms to scale up to larger problems. To push further in this direction, we would like to sample near R*, the subset of belief points reachable from b0 under optimal sequences of actions, as  R* is usually much smaller than R. Of course, the optimal sequences of actions constitute exactly the POMDP solution, which is unknown in advance. The main idea of our algorithm is to compute successive approximations of R* through sampling and converge to it iteratively. Hence the name SARSOP, which stands for Successive Approximations of the Reachable Space under Optimal Policies.

 Navigation. Grasping. Target tracking. Integrated exploration

In simulation, we successfully applied SARSOP to a set of common robotic tasks, including instances of coastal navigation, grasping, mobile robot exploration, and target tracking, all modeled as POMDPs with a large number of states. In most of the instances studied, SARSOP substantially outperformed one of the fastest existing point-based algorithms. The figure below shows the performance of SARSOP on the well-known Tag problem.

References

• D. Hsu. Towards Large-Scale POMDP Planning for Robotic Tasks. Invited talk at International Conference on Machine Learning (ICML), Workshop on Planning and Acting with Uncertain Models, 2011. Research supported in part by MDA GAMBIT grant R-252-000-398-490, MoE AcRF grant 2010-T2-2-071, and NUS AcRF grant R-252-000-424-112.
Slides
• H. Kurniawati, Y. Du, D. Hsu, and W.S. Lee. Motion planning under uncertainty for robotic tasks with long time horizonsInt. J. Robotics Research, 30(3):308-323, 2011.
PDF
• S.C.W. Ong, S.W. Png, D. Hsu, and W.S. Lee. Planning under uncertainty for robotic tasks with mixed observabilityInt. J. Robotics Research, 29(8):1053–1068, 2010.
PDF
• H. Kurniawati, D. Hsu, and W.S. Lee. SARSOP: Efficient point-based POMDP planning by approximating optimally reachable belief spaces. In Proc. Robotics: Science and Systems, 2008.
PDF
• D. Hsu, W.S. Lee, and N. Rong. A point-based POMDP planner for target tracking. In Proc. IEEE Int. Conf. on Robotics & Automation, pp. 2644–2650, 2008.
PDF
• D. Hsu, W.S. Lee, and N. Rong. What makes some POMDP problems easy to approximate? In Advances in Neural Information Processing Systems (NIPS), 2007.
PDF