Support-vector-machine classification of linear functional motifs in proteins

Dariusz Plewczynski,Lucjan Stanisław Wyrwicz,Adrian Tkacz,Leszek Rychlewski,Andrzej Kloczkowski,Adam Godzik

doi:10.1007/s00894-005-0070-2

Abstract

Our algorithm predicts short linear functional motifs in proteins using only sequence information. Statistical models for short linear functional motifs in proteins are built using the database of short sequence fragments taken from proteins in the current release of the Swiss-Prot database. Those segments are confirmed by experiments to have single-residue post-translational modification. The sensitivities of the classification for various types of short linear motifs are in the range of 70%. The query protein sequence is dissected into short overlapping fragments. All segments are represented as vectors. Each vector is then classified by a machine learning algorithm (Support Vector Machine) as potentially modifiable or not. The resulting list of plausible post-translational sites in the query protein is returned to the user. We also present a study of the human protein kinase C family as a biological application of our method.

Full Text