City University of Hong Kong
DSpace
 

CityU Institutional Repository >
3_CityU Electronic Theses and Dissertations >
ETD - School of Creative Media >
SCM - Doctor of Philosophy  >

Please use this identifier to cite or link to this item: http://hdl.handle.net/2031/5689

Title: Bounded-parameter partially observable Markov decision processes
Other Titles: You jie can shu bu fen ke guan ce Maerkefu jue ce guo cheng
有界參數部分可觀測馬爾可夫決策過程
Authors: Ni, Yaodong (倪耀東)
Department: School of Creative Media
Degree: Doctor of Philosophy
Issue Date: 2009
Publisher: City University of Hong Kong
Subjects: Markov processes.
Decision making -- Mathematical models.
Sequential analysis.
Uncertainty (Information theory)
Notes: CityU Call Number: QA274.7 .N5 2009
xv, 118 leaves : ill. 30 cm.
Thesis (Ph.D.)--City University of Hong Kong, 2009.
Includes bibliographical references (leaves 94-104)
Type: thesis
Abstract: In many online applications, it is critical to design an intelligent agent that is able to make sequential decisions in a complex environment with uncertainty. The partially observable Markov decision process (POMDP) is considered as a fundamental model for sequential decision making under uncertainty. Not only can the POMDP model describe the uncertainty in the action effects of the agent, but also describe the uncertainty in the agent’s observations. However, the lack of the knowledge for constructing the precise underlying model and the high computational complexity for obtaining optimality have made it difficult to apply POMDPs to real-world problems. In this dissertation, I focus on addressing the former, while the latter is also considered. In reality, it is usually an unattainable objective to precisely describe an environment by an exact POMDP model, due to various reasons such as inadequate data for model extraction, imprecision brought by consulting human experts and time-varying environment that cannot be described by a static POMDP model. However, it is possible to estimate the bounds on the parameters in the model based on the available information. In this dissertation, I study the POMDP problems with only information about the bounds of the model parameters. I propose the model of the bounded-parameter partially observable Markov decision process (BPOMDP), which is a generalization of the classic, exact POMDP model. In order to solve a BPOMDP, two key problems brought by parameter imprecision need to be addressed: (1) how to formulate the optimality criterion and (2) how to represent a policy. In this dissertation, I present two different solution performance criteria, namely themultimodel admissibility and the optimistic optimality. Themulti-model admissibility criterion refers to finding an admissible policy given that each policy is evaluated with multiple POMDP models. Under the optimistic optimality criterion, a policy is referred to as optimal if its highest possible total reward is themaximumamong all policies. In addition, I will discuss two representations of policies, each leading to an approach to solving BPOMDPs. The first policy representation, which is based on the concept of the policy tree, provides explicit policies for the finite-horizon problems and approximate solutions for infinite-horizon problems. I present a modified value iteration as the basic strategy for generating a multi-model admissible solution, and further propose a UL-based value iteration (ULVI) algorithm that is convenient to execute and helps reduce computational costs. Different settings of subroutines in the ULVI algorithm can lead to different performances. Theoretically, it is proven that the reward loss of the solution generated by the ULVI algorithm originates only from the imprecision in parameters. I have experiments to show that imprecision in the parameters affects both the computational costs and the reward loss and, at the expense of solution performance, the ULVI algorithm can be taken as an approximate method for exact POMDPs for saving computation. The policy representation called the finite-state controller (FSC) is particularly suitable for infinite-horizon problems, which occupies only finite memory and allows explicit policy-executing. Based on the theoretical results, I propose a value iteration method for evaluating FSCs. By representing a policy as an FSC, I develop the policy iteration algorithm for BPOMDPs. It is shown that the policy iteration algorithm converges to an ǫ-optimal policy under the optimistic optimality criterion in finite iterations. The BPOMDP model is a generalization of the traditional POMDP model in the sense that any exact POMDP model can be represented as a BPOMDP model, thus the approaches proposed in this dissertation are applicable for standard POMDP problems. As a more robust model, the BPOMDP model is expected to find a larger variety of applications than the POMDP model. In this view, the work presented in this dissertation is of general significance for sequential decision making under uncertainty.
Online Catalog Link: http://lib.cityu.edu.hk/record=b2374839
Appears in Collections:SCM - Doctor of Philosophy

Files in This Item:

File Description SizeFormat
abstract.html133 BHTMLView/Open
fulltext.html133 BHTMLView/Open

Items in CityU IR are protected by copyright, with all rights reserved, unless otherwise indicated.

 

Valid XHTML 1.0!
DSpace Software © 2013 CityU Library - Send feedback to Library Systems
Privacy Policy · Copyright · Disclaimer