Please use this identifier to cite or link to this item:
http://dspace.cityu.edu.hk/handle/2031/8708
Title: | Virus Integration Detection: From Algorithm to Implementation |
Authors: | Xu, Chang |
Department: | Department of Computer Science |
Issue Date: | 2016 |
Supervisor: | Supervisor: Dr. Li, Shuaicheng; First Reader: Dr. Xu, Hong Henry; Second Reader: Prof. Jia, Xiaohua |
Abstract: | Apart from causing a wide variety of acute diseases, viruses have also been found attributable to up to 15% of human cancers, among which is the notorious Human Papillomavirus (HPV) that is associated with above 95% of cervical cancers. Through sequencing, researchers have found evidence of virus existence in human DNA. It appears that viruses are able to integrate their genetic codes into host genome and disrupt local genomic stability. If the virus happens to amplify the expression of oncogenes or lower the expression of tumor suppressor genes, cancer is likely to develop. Virus shuffles host DNA flanking the integration site by causing minor or catastrophic amplifications and rearrangements of host genome. Reconstructing the virus infected region has become one of the research interests of bioinformatics scientists. However, shackled by the unsatisfactory sequencing lengths of today's sequencing methods, researchers had to resort to manual recovery to resolve the focal DNA structure. In this project, we have developed an algorithm that automatically deduce the local genomic map that is likely to exist in the interested region based on sequencing coverage of host, virus and host-virus junction segments. With the help of single-base resolution next generation sequencing data, the algorithm sheds lights on the possible consequences of virus integration induced structural transformations. Tests were performed against the algorithm and we compared the results to those that were published in academic papers. We are confident of the workability of our algorithm and pinned down plans for future refinement. We also introduced a comprehensive backend pipeline management tool Flowsmart which facilitates automatic scheduling of multi-step large-scale analytical pipelines that may otherwise, if managed manually, consume a considerable amount of man-hours for job submission and error handling. Flowsmart has successfully safeguarded a number of bioinformatics pipelines that involve Terabyte level volume of files within which each sample alone takes days to finish. Rather than a dedicated solution for a specific pipeline, Flowsmart is a promising framework for managing potentially all kinds of pipelines and can be deployed as a "middleware" that has a pipeline-level awareness located between user and the job-level management system, such as the Sun Grid Engine. |
Appears in Collections: | Computer Science - Undergraduate Final Year Projects |
Files in This Item:
File | Size | Format | |
---|---|---|---|
fulltext.html | 145 B | HTML | View/Open |
Items in Digital CityU Collections are protected by copyright, with all rights reserved, unless otherwise indicated.