Abstract

Protein methylation plays vital roles in many biological processes and has been implicated in various human diseases. To fully understand the mechanisms underlying methylation for use in drug design and work in methylation-related diseases, an initial but crucial step is to identify methylation sites. The use of high-throughput bioinformatics methods has become imperative to predict methylation sites. In this study, we developed a novel method that is based only on sequence conservation to predict protein methylation sites. Conservation difference profiles between methylated and non-methylated peptides were constructed by the information entropy (IE) in a wider neighbor interval around the methylation sites that fully incorporated all of the environmental information. Then, the distinctive neighbor residues were identified by the importance scores of information gain (IG). The most representative model was constructed by support vector machine (SVM) for Arginine and Lysine methylation, respectively. This model yielded a promising result on both the benchmark dataset and independent test set. The model was used to screen the entire human proteome, and many unknown substrates were identified. These results indicate that our method can serve as a useful supplement to elucidate the mechanism of protein methylation and facilitate hypothesis-driven experimental design and validation.

Highlights

  • Protein methylation plays vital roles in many biological processes and has been implicated in various human diseases

  • The polypeptide chains that are created by ribosomes undergo a series of “product-forming” steps, such as cutting, folding and posttranslational modification (PTM)

  • We attempted to identify distinctive positions from the far sides of a longer peptide sequence by combining position ranking via information gain (IG) and stepwise position selection via support vector machine (SVM)

Read more

Summary

Introduction

Protein methylation plays vital roles in many biological processes and has been implicated in various human diseases. The model was used to screen the entire human proteome, and many unknown substrates were identified These results indicate that our method can serve as a useful supplement to elucidate the mechanism of protein methylation and facilitate hypothesis-driven experimental design and validation. Shi et al.[20] presented a method called PLMLA that incorporated protein sequence information, secondary structure and amino acid properties to predict methyllysine sites. Qiu et al.[24] developed a method called iMethyl-PseAAC by incorporating physicochemical, sequence evolutionary, and structural information into a pseudo amino composition analysis Most of these methods applied an orthogonal encoding scheme to characterize the peptide sequence information such that each amino acid is always represented by the same 20-bit binary vector, regardless of where it occurs. The source code, datasets and SVM models can be freely found at http://cic.scu.edu.cn/bioinformatics/SourceCode_and_SVMmodel.zip

Methods
Findings
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.