|
|
CityU Institutional Repository >
CityU Electronic Theses and Dissertations >
ETD - School of Creative Media >
SCM - Doctor of Philosophy >
Please use this identifier to cite or link to this item:
http://hdl.handle.net/2031/5689
|
| Title: | Bounded-parameter partially observable Markov decision processes |
| Other Titles: | You jie can shu bu fen ke guan ce Maerkefu jue ce guo cheng 有界參數部分可觀測馬爾可夫決策過程 |
| Authors: | Ni, Yaodong (倪耀東) |
| Department: | School of Creative Media |
| Degree: | Doctor of Philosophy |
| Issue Date: | 2009 |
| Publisher: | City University of Hong Kong |
| Subjects: | Markov processes. Decision making -- Mathematical models. Sequential analysis. Uncertainty (Information theory) |
| Notes: | CityU Call Number: QA274.7 .N5 2009 xv, 118 leaves : ill. 30 cm. Thesis (Ph.D.)--City University of Hong Kong, 2009. Includes bibliographical references (leaves 94-104) |
| Type: | thesis |
| Abstract: | In many online applications, it is critical to design an intelligent agent that is able to make
sequential decisions in a complex environment with uncertainty. The partially observable
Markov decision process (POMDP) is considered as a fundamental model for sequential
decision making under uncertainty. Not only can the POMDP model describe the uncertainty
in the action effects of the agent, but also describe the uncertainty in the agent’s
observations. However, the lack of the knowledge for constructing the precise underlying
model and the high computational complexity for obtaining optimality have made it difficult
to apply POMDPs to real-world problems. In this dissertation, I focus on addressing
the former, while the latter is also considered.
In reality, it is usually an unattainable objective to precisely describe an environment
by an exact POMDP model, due to various reasons such as inadequate data for model extraction,
imprecision brought by consulting human experts and time-varying environment
that cannot be described by a static POMDP model. However, it is possible to estimate
the bounds on the parameters in the model based on the available information. In this
dissertation, I study the POMDP problems with only information about the bounds of the
model parameters.
I propose the model of the bounded-parameter partially observable Markov decision
process (BPOMDP), which is a generalization of the classic, exact POMDP model. In
order to solve a BPOMDP, two key problems brought by parameter imprecision need to be
addressed: (1) how to formulate the optimality criterion and (2) how to represent a policy.
In this dissertation, I present two different solution performance criteria, namely themultimodel
admissibility and the optimistic optimality. Themulti-model admissibility criterion
refers to finding an admissible policy given that each policy is evaluated with multiple POMDP models. Under the optimistic optimality criterion, a policy is referred to as
optimal if its highest possible total reward is themaximumamong all policies. In addition,
I will discuss two representations of policies, each leading to an approach to solving
BPOMDPs.
The first policy representation, which is based on the concept of the policy tree,
provides explicit policies for the finite-horizon problems and approximate solutions for
infinite-horizon problems. I present a modified value iteration as the basic strategy for
generating a multi-model admissible solution, and further propose a UL-based value iteration
(ULVI) algorithm that is convenient to execute and helps reduce computational
costs. Different settings of subroutines in the ULVI algorithm can lead to different performances.
Theoretically, it is proven that the reward loss of the solution generated by the
ULVI algorithm originates only from the imprecision in parameters. I have experiments
to show that imprecision in the parameters affects both the computational costs and the
reward loss and, at the expense of solution performance, the ULVI algorithm can be taken
as an approximate method for exact POMDPs for saving computation.
The policy representation called the finite-state controller (FSC) is particularly suitable
for infinite-horizon problems, which occupies only finite memory and allows explicit
policy-executing. Based on the theoretical results, I propose a value iteration method for
evaluating FSCs. By representing a policy as an FSC, I develop the policy iteration algorithm
for BPOMDPs. It is shown that the policy iteration algorithm converges to an
ǫ-optimal policy under the optimistic optimality criterion in finite iterations.
The BPOMDP model is a generalization of the traditional POMDP model in the sense
that any exact POMDP model can be represented as a BPOMDP model, thus the approaches
proposed in this dissertation are applicable for standard POMDP problems. As
a more robust model, the BPOMDP model is expected to find a larger variety of applications
than the POMDP model. In this view, the work presented in this dissertation is of
general significance for sequential decision making under uncertainty. |
| Online Catalog Link: | http://lib.cityu.edu.hk/record=b2374839 |
| Appears in Collections: | SCM - Doctor of Philosophy
|
Items in CityU IR are protected by copyright, with all rights reserved, unless otherwise indicated.
|