|
|
POMDP Planning
Overview
This project addresses the key challenges in building
fast algorithms for solving partially observable Markov decision
processes (POMDPs). Some of our POMDP algorithms have been implemented
as a software package
APPL.
POMDPs provide a
principled mathematical framework for planning and decision making
under uncertainty. However, POMDPs are often avoided in practice,
because solving
POMDPs exactly is computationally intractable. Not long ago, the best
algorithms could spend hours computing exact solutions to POMDPs with
only a dozen states. In recent years, point-based algorithms have made
impressive progress in computing approximately optimal solutions
for large POMDPs: POMDPs with
hundreds of states have been solved in a matter of seconds.
This project consists of two main thrusts:
- understand why the point-based algorithms work well,
whether
there are sub-classes of POMDPs that are computationally easier, and
what complexity measures are suitable for capturing the
difficulty of POMDP
planning for point-based algorithms, and
- develop efficient point-based POMDP algorithms by
exploting the insights from the theoretical analysis.
|
|
|
The Covering Number and The Complexity of POMDP Planning
Intuitively, the computational intractability is due to the
"curse of dimensionality": the belief space B used in solving a
POMDP
typically has dimensionality equal to |S|, the number of
states in the POMDP, and therefore the size of B grows
exponentially with |S|.
As a result, the number of states is often used as an
important complexity measure for POMDP planning. However, if the
number of states were an accurate measure of the "true" complexity of a
POMDP, it would seem surprising that point-based algorithms can obtain
(approximate) solutions in seconds in belief spaces of hundreds of
dimensions.
We propose to use the covering
number as
an alternative measure of the complexity of POMDP planning. As
expected, not all belief spaces have the same complexity even if they
have the same dimenionality. Consider the example below. Although B and B' are both
two-dimensional, it is intuitively clear that the size of B', which consists
only three points, is much smaller than of that B.
The
covering number captures several interesting structural properties that
drastically reduce the size of the belief space, such as sparsity and
factorization. Using the idea of the covering number, we have
identified several conditions under which approximate POMDP solutions
can be computed efficiently.
SARSOP
One key idea of point-based POMDP algorithms is to sample a set of
points from a belief space B
and use it as an approximate representation of B, instead of
representing B
exactly. Some early POMDP algorithms sample the entire belief
space B,
using a uniform sampling distribution, such as a grid.
However,
it is difficult to sample a representative set of points from B, due to its large
size. More recent point-based algorithms sample only R, the subset of
belief points reachable from a given initial point b0
in B,
under arbitrary sequences of actions. It is generally
believed that R
is much smaller than B.
Indeed, focusing on R
allows point-based algorithms to scale up to larger problems.
To
push further in this direction, we would like to sample near R*, the subset of
belief points reachable from b0
under optimal
sequences of actions, as R*
is usually much smaller than R.
Of course, the optimal sequences of actions constitute exactly the
POMDP solution, which is unknown in advance. The
main idea
of our algorithm is to compute successive approximations of R*
through sampling and converge to it iteratively. Hence the name SARSOP,
which stands for Successive Approximations of the Reachable Space under
Optimal Policies.
 |
 |
 |
| navigation |
grasping |
target
tracking |
|
 |
|
|
integrated
exploration |
|
In
simulation, we successfully applied SARSOP to a set of common robotic
tasks, including instances of coastal navigation, grasping, mobile
robot exploration, and target
tracking,
all modeled as POMDPs with a large number of states. In most of the
instances studied, SARSOP substantially outperformed one of the fastest
existing point-based algorithms. The figure below shows the
performance of SARSOP on the well-known Tag problem.
References
- H. Kurniawati, D. Hsu, and W.S. Lee. SARSOP:
Efficient point-based POMDP planning by approximating optimally
reachable belief spaces. In Proc. Robotics: Science
and Systems, 2008.
BibTeX PDF
- D. Hsu, W.S. Lee, and N. Rong. A point-based
POMDP planner for target tracking. In Proc. IEEE
Int. Conf. on Robotics & Automation, pp. 2644–2650,
2008.
BibTeX PDF
- D. Hsu, W.S. Lee, and N. Rong. What makes some
POMDP problems easy to approximate?. In Advances in
Neural Information Processing Systems (NIPS), 2007.
BibTeX PDF
People
Home
»
Research
|
|