Redundancy-free information retrieval on multiple XML documents with data semantics preservation

Chan, Kam Lam

Please use this identifier to cite or link to this item: http://dspace.cityu.edu.hk/handle/2031/7514

Title:	Redundancy-free information retrieval on multiple XML documents with data semantics preservation
Authors:	Chan, Kam Lam
Department:	Department of Computer Science
Issue Date:	2014
Supervisor:	Supervisor: Dr. Fong, Shi Piu Joseph; First Reader: Dr. Ngo, Chong Wah; Second Reader: Dr. Chan, Edward
Abstract:	Extensible Markup Language (XML) plays a vital role for data exchange in the Internet. It is able to maintain large number of information and simplify data storage and sharing in plain text format. However, the multiple declarations of XML schema and the structured information with large amount of data may affect usability. In order to consolidate multiple XML document from multiple XML databases to generate report, it takes longer time to develop a customized program to parse document, extract data, and integrate data as a single document. In addition, some possible problems may occur in integration such as rearrangement of data semantics and redundant data of matching element. Therefore, it should have a solution to accept global query to retrieve multiple XML documents and reduce the effort of program maintenance. In this project, it develops a prototype to accept global query input based on integrated schema to retrieve multiple XML documents. The multiple XML documents are stored in different XML databases. It includes three steps which are schema integration, query decomposition and data integration. The proposed prototype can handle three cases of schema integration including zero, one and multiple matching elements. The action of schema integration is automated to integrate two XML schemas as one integrated schema. Then, the prototype accepts global XQuery input in Path and FLWOR expressions. The global query may be decomposed as two sub queries for accessing different XML documents. Moreover, data integration is automated to create integrated document by integrating data based on query input. In order to preserve data semantics and eliminate redundant data, it designed three methodologies which are artificial root creation, reversed sub tree and key/keyRef technique.
Appears in Collections:	Computer Science - Undergraduate Final Year Projects

Files in This Item:

File	Size	Format
fulltext.html	146 B	HTML	View/Open

Show full item record