Date of Award:

5-2011

Document Type:

Thesis

Degree Name:

Master of Science (MS)

Department:

Computer Science

Committee Chair(s)

Daniel L. Bryce

Committee

Daniel L. Bryce

Committee

Vicki H. Allan

Committee

Daniel W. Watson

Abstract

Partially-observable Markov decision processes (POMDPs) are especially good at modeling real-world problems because they allow for sensor and effector uncertainty. Unfortunately, such uncertainty makes solving a POMDP computationally challenging. Traditional approaches, which are based on value iteration, can be slow because they find optimal actions for every possible situation. With the help of the Fast Forward (FF) planner, FF- Replan and FF-Hindsight have shown success in quickly solving fully-observable Markov decision processes (MDPs) by solving classical planning translations of the problem. This thesis extends the concept of problem determination to POMDPs by sampling action observations (similar to how FF-Replan samples action outcomes) and guiding the construction of policy trajectories with a conformant (as opposed to classical) planning heuristic. The resultant planner is called POND-Hindsight.

A number of technical approaches had to be employed within the planner, namely, 1) translating expected reward into a probability of goal satisfaction criterion, 2) monitoring belief states with a Rao-Blackwellized particle filter, and 3) employing Rao-Blackwellized particles in the McLUG probabilistic conformant planning graph heuristic. POND-Hindsight is an action selection mechanism that evaluates each possible action by generating a number of lookahead samples (up to a fixed horizon) that greedily select actions based on their heuristic value and samples the actions’ observation; the average goal satisfaction probability of the end horizon belief states is used as the value of each action.

POND-Hindsight was entered into the POMDP track of the 2011 International Probabilistic Planning Competition (IPPC) and performed comparable to its competitors – ranking in the middle of six planners. Benchmarks on the IPPC-2011 problems were run on a cluster of identical computers in order to evaluate computation time and plan quality. Success can be attributed to determinization of the problem, and failure can be attributed to a sometimes misleading heuristic combined with a greedy best-first lookahead algorithm.

Checksum

a27029018f3345bb9414b03746ebb520

Comments

This work is made publicly available electronically on September 29, 2011.

Recommended Citation

Olsen, Alan, "Pond-Hindsight: Applying Hindsight Optimization to Partially-Observable Markov Decision Processes" (2011). All Graduate Theses and Dissertations, Spring 1920 to Summer 2023. 1035.
https://digitalcommons.usu.edu/etd/1035

Download

Included in

Computer Sciences Commons

COinS

Copyright for this work is retained by the student. If you have any questions regarding the inclusion of this work in the Digital Commons, please email us at .

DOI

https://doi.org/10.26076/7b21-a77b

All Graduate Theses and Dissertations, Spring 1920 to Summer 2023

Pond-Hindsight: Applying Hindsight Optimization to Partially-Observable Markov Decision Processes

Date of Award:

Document Type:

Degree Name:

Department:

Committee Chair(s)

Committee

Committee

Committee

Abstract

Checksum

Comments

Recommended Citation

Included in

DOI

Browse

For Authors

Scholarly Communication

Research Data

All Graduate Theses and Dissertations, Spring 1920 to Summer 2023

Pond-Hindsight: Applying Hindsight Optimization to Partially-Observable Markov Decision Processes

Author

Date of Award:

Document Type:

Degree Name:

Department:

Committee Chair(s)

Committee

Committee

Committee

Abstract

Checksum

Comments

Recommended Citation

Included in

Share

DOI

Browse

For Authors

Scholarly Communication

Research Data