City University of Hong Kong

CityU Institutional Repository >
3_CityU Electronic Theses and Dissertations >
ETD - Dept. of Computer Science  >
CS - Master of Philosophy  >

Please use this identifier to cite or link to this item:

Title: In silico protein phosphorylation site prediction using pattern recognition approahes
Other Titles: Ji yu mo shi shi bie fang fa de dan bai zhi lin suan hua wei dian yu ce
Authors: Deng, Zhongkai (鄧鍾凱)
Department: Department of Computer Science
Degree: Master of Philosophy
Issue Date: 2009
Publisher: City University of Hong Kong
Subjects: Phosphorylation.
Pattern recognition systems.
Notes: CityU Call Number: QD281.P46 D46 2009
ix, 93 leaves : ill. 30 cm.
Thesis (M.Phil.)--City University of Hong Kong, 2009.
Includes bibliographical references (leaves [83]-92)
Type: thesis
Abstract: Protein phosophorylation (or simply phosphorylation for short) is a ubiqui- tous post-translational modi¯cation in both prokaryotic and eukaryotic organisms, which is catalyzed by a type of enzyme called kinase. Phosphorylation in particular plays a signi¯cant role in a wide range of cellular processes. With the fast growing number of novel protein sequences published, there is increasing need to identify phosphorylation sites in these sequences and also to specify the type of kinase(s) involved. Whilst those experimental methods that identify phosphorylation sites in vitro are usually labor-intensive and time-consuming, in silico prediction of phosphorylation sites is much desirable and popular for its convenience and fast speed. One of the most challenging issue in phosphorylation site prediction is the complex substrate speci¯city of the large kinase family, which makes this problem eligible for employing pattern recognition approaches like arti¯cial neural network (ANN). In this thesis, we introduce a novel classi¯er ensemble approach called Bagging-Adaboost Ensemble (BAE) and ¯rst apply this ensemble framework on eukaryotic protein phosphorylation prediction problem. BAE incorporates bagging technique and adaboost technique to improve the accuracy, stability and robust- ness of the result. This improvement is accomplished by (i) the enhancement of the diversity of training data set during bagging process and (ii) the adaptive weights of individual samples of the training data set during adaboost process. Although a number of approaches for predicting phosphorylation site based on ANN have been developed in the last decade, little e®ort has been put on the generation and selection of features of phosphorylation sites, which is a very crucial step leading to good performance. Hence, we analysis a broad spectrum of features, including discrete alphabets, evolutionary features, physicochemical features and structural features, based on class separability measuring criteria. We further propose a heterogenous feature representation for describing phosphorylation sites, which integrates features with high discriminatory power but low correlation with one another, thus reduces the dimensionality of the code vector and at the same time retains as much as possible of the class discriminatory information. We evaluate BAE on a large database of experimentally veri¯ed phosphory- lation sites, and compare the results with existing prediction systems that adopt neural network (NN) and support vector machine (SVM). The experimental results show that BAE outperforms many existing methods.
Online Catalog Link:
Appears in Collections:CS - Master of Philosophy

Files in This Item:

File Description SizeFormat
abstract.html134 BHTMLView/Open
fulltext.html134 BHTMLView/Open

Items in CityU IR are protected by copyright, with all rights reserved, unless otherwise indicated.


Valid XHTML 1.0!
DSpace Software © 2013 CityU Library - Send feedback to Library Systems
Privacy Policy · Copyright · Disclaimer