Abstract
BackgroundPost-translational modifications (PTMs) occur on almost all proteins and often strongly affect the functions of modified proteins. Phosphorylation is a crucial PTM mechanism with important regulatory functions in biological systems. Identifying the potential phosphorylation sites of a target protein may increase our understanding of the molecular processes in which it takes part.ResultsIn this paper, we propose PredPhos, a computational method that can accurately predict both kinase-specific and non-kinase-specific phosphorylation sites by using optimally selected properties. The optimal combination of features was selected from a set of 153 novel structural neighborhood properties by a two-step feature selection method consisting of a random forest algorithm and a sequential backward elimination method. To overcome the imbalanced problem, we adopt an ensemble method, which combines bootstrap resampling technique, support vector machine-based fusion classifiers and majority voting strategy. We evaluate the proposed method using both tenfold cross validation and independent test. Results show that our method achieves a significant improvement on the prediction performance for both kinase-specific and non-kinase-specific phosphorylation sites.ConclusionsThe experimental results demonstrate that the proposed method is quite effective in predicting phosphorylation sites. Promising results are derived from the new structural neighborhood properties, the novel way of feature selection, as well as the ensemble method.
Highlights
Post-translational modifications (PTMs) occur on almost all proteins and often strongly affect the functions of modified proteins
Four species of phosphoproteome data are included in PhosphoPep, which are yeast (Saccharomyces cerevisiae), worm (Caenorhabditis elegans), fly (D. melanogaster) and human (Homo sapiens), respectively, and a novel function was implemented to analyze the conservation of the identified phosphorylation sites across species
Negative phosphorylation sites gathered from their respective proteins had to meet three criteria: (1) a potential negative site could not have been reported as a positive site; (2) it had to be within a protein that contained known positive sites; and (3) a negative phosphorylation site had to be solventinaccessible
Summary
Post-translational modifications (PTMs) occur on almost all proteins and often strongly affect the functions of modified proteins. Swiss-Prot [13] is a widely used protein sequence and knowledge database, which provides plentiful information about the post-translational modification. Four species of phosphoproteome data are included in PhosphoPep, which are yeast (Saccharomyces cerevisiae), worm (Caenorhabditis elegans), fly (D. melanogaster) and human (Homo sapiens), respectively, and a novel function was implemented to analyze the conservation of the identified phosphorylation sites across species. PHOSIDA [17] is a database aims to manage post-translational modification sites of various species, including human, mouse, fly, worm and yeast proteins. Under the demand for analyzing the structural features of experimentally verified phosphorylation sites, Phospho3D [18] was launched for storing information retrieved from Phospho.ELM and was enriched with structural information and annotation at the residue level
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.