|
|
CityU Institutional Repository >
CityU Electronic Theses and Dissertations >
ETD - Dept. of Computer Science >
CS - Master of Philosophy >
Please use this identifier to cite or link to this item:
http://hdl.handle.net/2031/6212
|
| Title: | Functional analysis of gene products based on gene ontology |
| Other Titles: | Ji yu gene ontology de dan bai zhi gong neng fen xi 基於 gene ontology 的蛋白質功能分析 |
| Authors: | Tan, Lirong (譚麗容) |
| Department: | Department of Computer Science |
| Degree: | Master of Philosophy |
| Issue Date: | 2010 |
| Publisher: | City University of Hong Kong |
| Subjects: | Genetics -- Data processing. Proteins -- Analysis. Bioinformatics. |
| Notes: | CityU Call Number: QH441.2 .T36 2010 xii, 80 leaves : col. ill. 30 cm. Thesis (M.Phil.)--City University of Hong Kong, 2010. Includes bibliographical references (leaves 69-80) |
| Type: | thesis |
| Abstract: | With the rapid advances in genomic sequencing, large amount of biological sequences
are becoming available. On the other hand, experimental methods for functional interpretation
are inherently slow. As a result, the gap between sequence and function
continues to expand. Under such circumstances, computational analysis of the function
of biological sequences has become one of the most important tasks in the area of
bioinformatics.
Before any electronic analysis, one problem that should be firstly addressed is the
elusive definitions of protein functions. Scientists from different fields may use different
words to describe the same functional property, which significantly decreases its
readability for machines. In an attempt to unify the definitions of functional properties,
many functional classification systems have been proposed. Gene Ontology (GO) is
one of them. Unlike all other schemes that are tailored for specific genomes, although
may be generalized later, GO sets out with the goal of providing a common functional
ontology across the whole biological world. This underlying shift in ideology leads to
the rapid recognition of GO as the most successful functional scheme, which is therefore
widely used in the functional analysis. As a preliminary exploration on the topic of
GO-based functional analysis, we have concentrated our studies on two main aspects.
Firstly, we have tried to predict protein function from domain content using the machine
learning techniques. Two new models have been proposed: a Correlation Coefficient
based model (CC-M) and a Support Vector Machine (SVM) based model (SVMM).
We have developed our models in the form of predictors for all GO terms with
manually curated annotations. In comparison with the Bayesian probabilistic approach published previously, our methods are demonstrated to have better capabilities in dealing
with incomplete training data. In particular, the CC-M method is suitable for GO
terms with extremely low occurrence frequency, and the SVM-M method for the remaining
GO terms. Therefore, CC-M and SVM-M are subsequently integrated into a
single model (CC-SVM), with their respective advantages combined. With the CC-SVM
based predictors, two predictions have been made about the GO annotations Transcription
Factor (TF) binding activity (GO:0008134) and nucleus (GO:0005634) for proteins
in the myosin family. 18 myosin proteins are predicted to have TF binding activity, and
119 myosin proteins are labeled to be positive for nuclear localization.
Another functional analysis involves the quantification of the functional relationship
between gene products. While semantic similarity between GO terms has been
extensively explored, it remains unclear about how to derive the functional similarity
between gene products from their GO annotations. In this work, modified Hausdorff
distance (MHD), which is originally proposed for image matching, is firstly applied to
functional similarity calculation. Furthermore, we have initially incorporated the depth
information of GO terms and proposed another new measure, which is called levelbased
similarity (LBS). We have evaluated MHD and LBS using expression data and
protein-protein interaction (PPI) data of S. cerevisae in comparison with six other existing
approaches. The correlation coefficient analysis and receiver operating characteristics
(ROC) analysis have proved the validity of these two new measures. |
| Online Catalog Link: | http://lib.cityu.edu.hk/record=b3947808 |
| Appears in Collections: | CS - Master of Philosophy
|
Items in CityU IR are protected by copyright, with all rights reserved, unless otherwise indicated.
|