Home
Research
Publications
People
Internal
Projects

POMDP Planning

Overview

This project addresses the key challenges in building fast algorithms for solving partially observable Markov decision processes (POMDPs). Some of our POMDP algorithms have been implemented as a software package APPL.

POMDPs provide a principled mathematical framework for planning and decision making under uncertainty. However, POMDPs are often avoided in practice, because solving POMDPs exactly is computationally intractable. Not long ago, the best algorithms could spend hours computing exact solutions to POMDPs with only a dozen states. In recent years, point-based algorithms have made impressive progress in computing approximately optimal solutions for large POMDPs: POMDPs with hundreds of states have been solved in a matter of seconds.  This project consists of two main thrusts:
  • understand why the point-based algorithms work well, whether there are sub-classes of POMDPs that are computationally easier, and what complexity measures are suitable for capturing the difficulty of POMDP planning for point-based algorithms, and
  • develop efficient point-based POMDP algorithms by exploting the insights from the theoretical analysis.

Software

Apr 15, 2008
APPL, our software package for solving POMDPs, is available for download here.

The Covering Number and The Complexity of POMDP Planning

Intuitively, the computational  intractability is due to the "curse of dimensionality": the belief space B used in solving a POMDP typically has dimensionality equal to |S|, the number of states in the POMDP, and therefore the size of B grows exponentially with |S|. As a result, the number of states is often used as an important complexity measure for POMDP planning. However, if the number of states were an accurate measure of the "true" complexity of a POMDP, it would seem surprising that point-based algorithms can obtain (approximate) solutions in seconds in belief spaces of hundreds of dimensions.

We propose to use the covering number as an alternative measure of the complexity of POMDP planning. As expected, not all belief spaces have the same complexity even if they have the same dimenionality. Consider the example below. Although B and B' are both two-dimensional, it is intuitively clear that the size of B', which consists only three points, is much smaller than of that B.
The covering number captures several interesting structural properties that drastically reduce the size of the belief space, such as sparsity and factorization. Using the idea of the covering number, we have identified several conditions under which approximate POMDP solutions can be computed efficiently.

SARSOP

One key idea of point-based POMDP algorithms is to sample a set of points from a belief space B and use it as an approximate representation of B, instead of representing B exactly.  Some early POMDP algorithms sample the entire belief space B, using a uniform sampling distribution, such as a grid.  However, it is difficult to sample a representative set of points from B, due to its large size.  More recent point-based algorithms sample only R, the subset of belief points reachable from a given initial point b0 in B, under arbitrary sequences of actions.  It is generally believed that R is much smaller than B. Indeed, focusing on R allows point-based algorithms to scale up to  larger problems. To push further in this direction, we would like to sample  near R*, the subset of belief points reachable from b0  under optimal sequences of actions, as  R* is usually much smaller than R. Of course, the optimal sequences of actions constitute exactly the POMDP solution, which is unknown in advance.  The main idea of our algorithm is to compute successive approximations of R* through sampling and converge to it iteratively. Hence the name SARSOP, which stands for Successive Approximations of the Reachable Space under Optimal Policies.
navigation grasping target tracking
integrated exploration

In simulation, we successfully applied SARSOP to a set of common robotic tasks, including instances of coastal navigation, grasping, mobile robot exploration, and target tracking, all modeled as POMDPs with a large number of states. In most of the instances studied, SARSOP substantially outperformed one of the fastest existing point-based algorithms.  The figure below shows the performance of SARSOP on the well-known Tag problem.

References

  • H. Kurniawati, D. Hsu, and W.S. Lee. SARSOP: Efficient point-based POMDP planning by approximating optimally reachable belief spaces. In Proc. Robotics: Science and Systems, 2008.
    BibTeX   PDF 
  • D. Hsu, W.S. Lee, and N. Rong. A point-based POMDP planner for target tracking. In Proc. IEEE Int. Conf. on Robotics & Automation, pp. 2644–2650, 2008.
    BibTeX   PDF 
  • D. Hsu, W.S. Lee, and N. Rong. What makes some POMDP problems easy to approximate?. In Advances in Neural Information Processing Systems (NIPS), 2007.
    BibTeX   PDF 

People

  Yanzhu Du
  David Hsu
  Xan Huang
  Hanna Kurniawati
  Wee Sun Lee
  Sylvie Ong
  Shaowei Png


 

Home » Research