Abstract
Protein ubiquitylation is an important posttranslational modification (PTM), which is involved in diverse biological processes and plays an essential role in the regulation of physiological mechanisms and diseases. The Protein Lysine Modifications Database (PLMD) has accumulated abundant ubiquitylated proteins with their substrate sites for more than 20 kinds of species. Numerous works have consequently developed a variety of ubiquitylation site prediction tools across all species, mainly relying on the predefined sequence features and machine learning algorithms. However, the difference in ubiquitylated patterns between these species stays unclear. In this work, the sequence-based characterization of ubiquitylated substrate sites has revealed remarkable differences among plants, animals, and fungi. Then an improved word-embedding scheme based on the transfer learning strategy was incorporated with the multilayer convolutional neural network (CNN) for identifying protein ubiquitylation sites. For the prediction of plant ubiquitylation sites, the proposed deep learning scheme could outperform the machine learning-based methods, with the accuracy of 75.6%, precision of 73.3%, recall of 76.7%, F-score of 0.7493, and 0.82 AUC on the independent testing set. Although the ubiquitylated specificity of substrate sites is complicated, this work has demonstrated that the application of the word-embedding method can enable the extraction of informative features and help the identification of ubiquitylated sites. To accelerate the investigation of protein ubiquitylation, the data sets and source code used in this study are freely available at https://github.com/wang-hong-fei/DL-plant-ubsites-prediction.
Highlights
As one of the most important posttranslational modification (PTM) processes, ubiquitylation is a modification process in which one or more ubiquitin molecules covalently bind to substrate proteins under the action of a series of enzymes (E1, E2, E3) (Weissman, 2001)
The ubiquitylated protein data are collected from various eukaryotic species, and, considering the features of species evolution, Ubisite (Huang et al, 2016) proposed the position-specific scoring matrices (PSSM), which are calculated through PSIBLAST
The multilayer convolutional neural network (CNN) was employed as a classifier and achieved acceptable performance for plant ubiquitination site prediction
Summary
As one of the most important posttranslational modification (PTM) processes, ubiquitylation is a modification process in which one or more ubiquitin molecules covalently bind to substrate proteins under the action of a series of enzymes (E1, E2, E3) (Weissman, 2001). The ubiquitin– proteasome pathway (UPP) is the most important protein degradation pathway in eukaryotic cells and participates in various physiological processes, including transcription regulation, cell cycle, Research of Plant Ubiquitylation Sites apoptosis, DNA damage repair, metabolism, and immunity (Tu et al, 2012). Numerous machine learning methods have been proposed for automatic prediction of ubiquitination sites. Qiu et al (2015) believing, through the simple observation of the composition of amino acids, that the sequence order of proteins may be ignored, utilized the pseudo-amino acid composition (PseAAC) to reserve these essential features and developed the iUbiq-Lys. The ubiquitylated protein data are collected from various eukaryotic species, and, considering the features of species evolution, Ubisite (Huang et al, 2016) proposed the position-specific scoring matrices (PSSM), which are calculated through PSIBLAST. As a promising structural data modeling approach, the deep learning method can extract features from original data automatically without feature engineering, some potential and essential features will not be ignored. He et al (2018) employed the deep learning approach on ubiquitination site prediction and received a well performance on their testing set
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.