Skip navigation
Run Run Shaw Library City University of Hong KongRun Run Shaw Library

Please use this identifier to cite or link to this item: http://dspace.cityu.edu.hk/handle/2031/3668
Full metadata record
DC FieldValueLanguage
dc.contributor.authorMak, Chuen Chung
dc.date.accessioned2006-10-19T07:18:45Z
dc.date.accessioned2017-09-19T08:51:18Z
dc.date.accessioned2019-02-12T06:53:31Z-
dc.date.available2006-10-19T07:18:45Z
dc.date.available2017-09-19T08:51:18Z
dc.date.available2019-02-12T06:53:31Z-
dc.date.issued2006
dc.identifier.other2006csmcc443
dc.identifier.urihttp://144.214.8.231/handle/2031/3668-
dc.description.abstractIn this project, we are going to construct a biological sequence database that support approximate pattern matching. The approximate pattern matching is a fundamental topic in biological research. It is required to search a related pattern in biological sequence, like DNA and protein sequences. A pair of sequence is said to be similar if their difference is less than a given threshold. The most classical solution to this problem is dynamic programming approach. There are several good computation time algorithms available to match a pattern in a sequence. However, most of them concern the online version of the problem that they assume both pattern and text are not known before. Sequential scanning of the text is required for those algorithms. Moreover, those algorithms do not consider the I/O requirement and the secondary storage management of the text. As the biological sequence data is rapidly growing, a database for large data is required. Our goal of this project is to construct a database that support approximate sequence matching with good memory management, low I/O cost and short overall runtime. In our project, we studied several algorithm and data structure on the pattern matching. We developed a system that support approximate pattern matching by using q-gram index database with q-gram filter proposed in the previous study. In the project, we have implemented the original q-gram filter. The filter has several problems on the memory requirement and accuracy. We did some improvement of the filter and compare them with the original one. From the result of the experiment, it showed that our improved filters are more efficient than the original one.en
dc.format.extent164 bytes
dc.format.mimetypetext/html
dc.rightsThis work is protected by copyright. Reproduction or distribution of the work in any format is prohibited without written permission of the copyright owner.
dc.rightsAccess is restricted to CityU users.
dc.titleApproximate pattern matchingen
dc.contributor.departmentDepartment of Computer Scienceen
dc.description.supervisorDr. Poon, C K. First Reader: Dr. Wang, Jiying. Second Reader: Prof. Ip, Horaceen
Appears in Collections:Computer Science - Undergraduate Final Year Projects 

Files in This Item:
File SizeFormat 
fulltext.html164 BHTMLView/Open
Show simple item record


Items in Digital CityU Collections are protected by copyright, with all rights reserved, unless otherwise indicated.

Send feedback to Library Systems
Privacy Policy | Copyright | Disclaimer