|
CityU Institutional Repository >
CityU Electronic Theses and Dissertations >
ETD - Dept. of Computer Science >
CS - Master of Philosophy >
Please use this identifier to cite or link to this item:
http://hdl.handle.net/2031/5770
|
| Title: | In silico protein phosphorylation site prediction using pattern recognition approahes |
| Other Titles: | Ji yu mo shi shi bie fang fa de dan bai zhi lin suan hua wei dian yu ce 基於模式識別方法的蛋白質磷酸化位點預測 |
| Authors: | Deng, Zhongkai (鄧鍾凱) |
| Department: | Department of Computer Science |
| Degree: | Master of Philosophy |
| Issue Date: | 2009 |
| Publisher: | City University of Hong Kong |
| Subjects: | Phosphorylation. Pattern recognition systems. Bioinformatics. |
| Notes: | CityU Call Number: QD281.P46 D46 2009 ix, 93 leaves : ill. 30 cm. Thesis (M.Phil.)--City University of Hong Kong, 2009. Includes bibliographical references (leaves [83]-92) |
| Type: | thesis |
| Abstract: | Protein phosophorylation (or simply phosphorylation for short) is a ubiqui-
tous post-translational modi¯cation in both prokaryotic and eukaryotic organisms,
which is catalyzed by a type of enzyme called kinase. Phosphorylation in particular
plays a signi¯cant role in a wide range of cellular processes. With the fast growing
number of novel protein sequences published, there is increasing need to identify
phosphorylation sites in these sequences and also to specify the type of kinase(s)
involved. Whilst those experimental methods that identify phosphorylation sites
in vitro are usually labor-intensive and time-consuming, in silico prediction of
phosphorylation sites is much desirable and popular for its convenience and fast
speed.
One of the most challenging issue in phosphorylation site prediction is the
complex substrate speci¯city of the large kinase family, which makes this problem
eligible for employing pattern recognition approaches like arti¯cial neural network
(ANN). In this thesis, we introduce a novel classi¯er ensemble approach called
Bagging-Adaboost Ensemble (BAE) and ¯rst apply this ensemble framework on
eukaryotic protein phosphorylation prediction problem. BAE incorporates bagging
technique and adaboost technique to improve the accuracy, stability and robust-
ness of the result. This improvement is accomplished by (i) the enhancement of
the diversity of training data set during bagging process and (ii) the adaptive
weights of individual samples of the training data set during adaboost process.
Although a number of approaches for predicting phosphorylation site based
on ANN have been developed in the last decade, little e®ort has been put on the generation and selection of features of phosphorylation sites, which is a very crucial
step leading to good performance. Hence, we analysis a broad spectrum of features,
including discrete alphabets, evolutionary features, physicochemical features and
structural features, based on class separability measuring criteria. We further
propose a heterogenous feature representation for describing phosphorylation sites,
which integrates features with high discriminatory power but low correlation with
one another, thus reduces the dimensionality of the code vector and at the same
time retains as much as possible of the class discriminatory information.
We evaluate BAE on a large database of experimentally veri¯ed phosphory-
lation sites, and compare the results with existing prediction systems that adopt
neural network (NN) and support vector machine (SVM). The experimental results
show that BAE outperforms many existing methods. |
| Online Catalog Link: | http://lib.cityu.edu.hk/record=b2375032 |
| Appears in Collections: | CS - Master of Philosophy
|
Items in CityU IR are protected by copyright, with all rights reserved, unless otherwise indicated.
|