City University of Hong Kong

CityU Institutional Repository >
3_CityU Electronic Theses and Dissertations >
ETD - Dept. of Computer Science  >
CS - Master of Philosophy  >

Please use this identifier to cite or link to this item:

Title: Indexing and query processing of XML documents
Other Titles: Ke kuo zhan biao ji yu yan de suo yin yu cha xun chu li
Authors: Yuen, Chi Hang (袁智恆)
Department: Dept. of Computer Science
Degree: Master of Philosophy
Issue Date: 2006
Publisher: City University of Hong Kong
Subjects: Indexing
Querying (Computer science)
XML (Document markup language)
Notes: 86 leaves : ill. ; 30 cm.
CityU Call Number: QA76.76.H94 Y84 2006
Includes bibliographical references (leaves 76-85)
Thesis (M.Phil.)--City University of Hong Kong, 2006
Type: Thesis
Abstract: The Extensible Markup Language (XML) [Con00] is becoming the de facto standard for information representation and exchange over the Internet. Owing to its hierarchical (recursive) and self-describing syntax, XML is flexible enough to express a large variety of information. To retrieve useful information from XML, queries expressed in query language like XPath is used to specify some elements that suit a given criteria. An XPath expression is comprised of a sequence of location steps, each consisting of an axis, a node test, and possibly a predicate. An axis specifies the structural relationship between elements among two adjacent location steps. A node test gives restriction on the names or types of the elements selected in a location step. A predicate specifies further criteria. In this thesis, the indexing and query processing of XML documents are studied. First, we designed an efficient indexing structure for XML documents so that each basic XPath axis step is supported. The indexing structure is built on top of the B+-tree which is available in practically all commercial relational database systems. For most of the basic axis steps, we are able to derive theoretical worst case execution time bounds. We also perform experimental evaluation to substantiate those bounds. Besides, we also studied XML twig pattern matching algorithms and designed an enhancement to TJFast, a state-of-the-art algorithm for the problem. This reduces the CPU cost and also favors indexed inputs. Our algorithm can be shown analytically as efficient as TJFast in terms of worst case I/O, and experimentally performs significantly better.
Online Catalog Link:
Appears in Collections:CS - Master of Philosophy

Files in This Item:

File Description SizeFormat
fulltext.html159 BHTMLView/Open
abstract.html159 BHTMLView/Open

Items in CityU IR are protected by copyright, with all rights reserved, unless otherwise indicated.


Valid XHTML 1.0!
DSpace Software © 2013 CityU Library - Send feedback to Library Systems
Privacy Policy · Copyright · Disclaimer