City University of Hong Kong
DSpace
 

CityU Institutional Repository >
3_CityU Electronic Theses and Dissertations >
ETD - Dept. of Computer Science  >
CS - Master of Philosophy  >

Please use this identifier to cite or link to this item: http://hdl.handle.net/2031/6212

Title: Functional analysis of gene products based on gene ontology
Other Titles: Ji yu gene ontology de dan bai zhi gong neng fen xi
基於 gene ontology 的蛋白質功能分析
Authors: Tan, Lirong (譚麗容)
Department: Department of Computer Science
Degree: Master of Philosophy
Issue Date: 2010
Publisher: City University of Hong Kong
Subjects: Genetics -- Data processing.
Proteins -- Analysis.
Bioinformatics.
Notes: CityU Call Number: QH441.2 .T36 2010
xii, 80 leaves : col. ill. 30 cm.
Thesis (M.Phil.)--City University of Hong Kong, 2010.
Includes bibliographical references (leaves 69-80)
Type: thesis
Abstract: With the rapid advances in genomic sequencing, large amount of biological sequences are becoming available. On the other hand, experimental methods for functional interpretation are inherently slow. As a result, the gap between sequence and function continues to expand. Under such circumstances, computational analysis of the function of biological sequences has become one of the most important tasks in the area of bioinformatics. Before any electronic analysis, one problem that should be firstly addressed is the elusive definitions of protein functions. Scientists from different fields may use different words to describe the same functional property, which significantly decreases its readability for machines. In an attempt to unify the definitions of functional properties, many functional classification systems have been proposed. Gene Ontology (GO) is one of them. Unlike all other schemes that are tailored for specific genomes, although may be generalized later, GO sets out with the goal of providing a common functional ontology across the whole biological world. This underlying shift in ideology leads to the rapid recognition of GO as the most successful functional scheme, which is therefore widely used in the functional analysis. As a preliminary exploration on the topic of GO-based functional analysis, we have concentrated our studies on two main aspects. Firstly, we have tried to predict protein function from domain content using the machine learning techniques. Two new models have been proposed: a Correlation Coefficient based model (CC-M) and a Support Vector Machine (SVM) based model (SVMM). We have developed our models in the form of predictors for all GO terms with manually curated annotations. In comparison with the Bayesian probabilistic approach published previously, our methods are demonstrated to have better capabilities in dealing with incomplete training data. In particular, the CC-M method is suitable for GO terms with extremely low occurrence frequency, and the SVM-M method for the remaining GO terms. Therefore, CC-M and SVM-M are subsequently integrated into a single model (CC-SVM), with their respective advantages combined. With the CC-SVM based predictors, two predictions have been made about the GO annotations Transcription Factor (TF) binding activity (GO:0008134) and nucleus (GO:0005634) for proteins in the myosin family. 18 myosin proteins are predicted to have TF binding activity, and 119 myosin proteins are labeled to be positive for nuclear localization. Another functional analysis involves the quantification of the functional relationship between gene products. While semantic similarity between GO terms has been extensively explored, it remains unclear about how to derive the functional similarity between gene products from their GO annotations. In this work, modified Hausdorff distance (MHD), which is originally proposed for image matching, is firstly applied to functional similarity calculation. Furthermore, we have initially incorporated the depth information of GO terms and proposed another new measure, which is called levelbased similarity (LBS). We have evaluated MHD and LBS using expression data and protein-protein interaction (PPI) data of S. cerevisae in comparison with six other existing approaches. The correlation coefficient analysis and receiver operating characteristics (ROC) analysis have proved the validity of these two new measures.
Online Catalog Link: http://lib.cityu.edu.hk/record=b3947808
Appears in Collections:CS - Master of Philosophy

Files in This Item:

File Description SizeFormat
abstract.html134 BHTMLView/Open
fulltext.html134 BHTMLView/Open

Items in CityU IR are protected by copyright, with all rights reserved, unless otherwise indicated.

 

Valid XHTML 1.0!
DSpace Software © 2013 CityU Library - Send feedback to Library Systems
Privacy Policy · Copyright · Disclaimer