Approximate pattern matching

Mak, Chuen Chung

Please use this identifier to cite or link to this item: http://dspace.cityu.edu.hk/handle/2031/3668

Full metadata record

DC Field	Value	Language
dc.contributor.author	Mak, Chuen Chung
dc.date.accessioned	2006-10-19T07:18:45Z
dc.date.accessioned	2017-09-19T08:51:18Z
dc.date.accessioned	2019-02-12T06:53:31Z	-
dc.date.available	2006-10-19T07:18:45Z
dc.date.available	2017-09-19T08:51:18Z
dc.date.available	2019-02-12T06:53:31Z	-
dc.date.issued	2006
dc.identifier.other	2006csmcc443
dc.identifier.uri	http://144.214.8.231/handle/2031/3668	-
dc.description.abstract	In this project, we are going to construct a biological sequence database that support approximate pattern matching. The approximate pattern matching is a fundamental topic in biological research. It is required to search a related pattern in biological sequence, like DNA and protein sequences. A pair of sequence is said to be similar if their difference is less than a given threshold. The most classical solution to this problem is dynamic programming approach. There are several good computation time algorithms available to match a pattern in a sequence. However, most of them concern the online version of the problem that they assume both pattern and text are not known before. Sequential scanning of the text is required for those algorithms. Moreover, those algorithms do not consider the I/O requirement and the secondary storage management of the text. As the biological sequence data is rapidly growing, a database for large data is required. Our goal of this project is to construct a database that support approximate sequence matching with good memory management, low I/O cost and short overall runtime. In our project, we studied several algorithm and data structure on the pattern matching. We developed a system that support approximate pattern matching by using q-gram index database with q-gram filter proposed in the previous study. In the project, we have implemented the original q-gram filter. The filter has several problems on the memory requirement and accuracy. We did some improvement of the filter and compare them with the original one. From the result of the experiment, it showed that our improved filters are more efficient than the original one.	en
dc.format.extent	164 bytes
dc.format.mimetype	text/html
dc.rights	This work is protected by copyright. Reproduction or distribution of the work in any format is prohibited without written permission of the copyright owner.
dc.rights	Access is restricted to CityU users.
dc.title	Approximate pattern matching	en
dc.contributor.department	Department of Computer Science	en
dc.description.supervisor	Dr. Poon, C K. First Reader: Dr. Wang, Jiying. Second Reader: Prof. Ip, Horace	en
Appears in Collections:	Computer Science - Undergraduate Final Year Projects

Files in This Item:

File	Size	Format
fulltext.html	164 B	HTML	View/Open

Show simple item record