Please use this identifier to cite or link to this item:
http://dspace.cityu.edu.hk/handle/2031/7514
Title: | Redundancy-free information retrieval on multiple XML documents with data semantics preservation |
Authors: | Chan, Kam Lam |
Department: | Department of Computer Science |
Issue Date: | 2014 |
Supervisor: | Supervisor: Dr. Fong, Shi Piu Joseph; First Reader: Dr. Ngo, Chong Wah; Second Reader: Dr. Chan, Edward |
Abstract: | Extensible Markup Language (XML) plays a vital role for data exchange in the Internet. It is able to maintain large number of information and simplify data storage and sharing in plain text format. However, the multiple declarations of XML schema and the structured information with large amount of data may affect usability. In order to consolidate multiple XML document from multiple XML databases to generate report, it takes longer time to develop a customized program to parse document, extract data, and integrate data as a single document. In addition, some possible problems may occur in integration such as rearrangement of data semantics and redundant data of matching element. Therefore, it should have a solution to accept global query to retrieve multiple XML documents and reduce the effort of program maintenance. In this project, it develops a prototype to accept global query input based on integrated schema to retrieve multiple XML documents. The multiple XML documents are stored in different XML databases. It includes three steps which are schema integration, query decomposition and data integration. The proposed prototype can handle three cases of schema integration including zero, one and multiple matching elements. The action of schema integration is automated to integrate two XML schemas as one integrated schema. Then, the prototype accepts global XQuery input in Path and FLWOR expressions. The global query may be decomposed as two sub queries for accessing different XML documents. Moreover, data integration is automated to create integrated document by integrating data based on query input. In order to preserve data semantics and eliminate redundant data, it designed three methodologies which are artificial root creation, reversed sub tree and key/keyRef technique. |
Appears in Collections: | Computer Science - Undergraduate Final Year Projects |
Files in This Item:
File | Size | Format | |
---|---|---|---|
fulltext.html | 146 B | HTML | View/Open |
Items in Digital CityU Collections are protected by copyright, with all rights reserved, unless otherwise indicated.