Protein Descriptors Research Articles

Phosphorylation is a ubiquitous type of post-translational modification (PTM) that occurs in both eukaryotic and prokaryotic cells where in a phosphate group binds with amino acid residues. These specific residues, i.e., serine (S), threonine (T), and tyrosine (Y), exhibit diverse functions at the molecular level. Recent studies have determined that some diseases such as cancer, diabetes, and neurodegenerative diseases are caused by abnormal phosphorylation. Based on its potential applications in biological research and drug development, the large-scale identification of phosphorylation sites has attracted interest. Existing wet-lab technologies for targeting phosphorylation sites are overpriced and time consuming. Thus, computational algorithms that can efficiently accelerate the annotation of phosphorylation sites from massive protein sequences are needed. Numerous machine learning-based methods have been implemented for phosphorylation sites prediction. However, despite extensive efforts, existing computational approaches continue to have inadequate performance, particularly in terms of overall ACC, MCC, and AUC. In this paper, we report a novel deep learning-based predictor to overcome these performance hurdles, DeepPPSite, which was constructed using a stacked long short-term memory recurrent network for predicting phosphorylation sites. The proposed technique expediently learns the protein representations from conjoint protein descriptors. The experimental results indicated that our model achieved superior performance on the training dataset for S, T and Y, with MCC values of 0.608, 0.602, and 0.558, respectively, using a 10-fold cross-validation test. We further determined the generalization efficacy of the proposed predictor DeepPPSite by conducting a rigorous independent test. The predictive MCC values were 0.358, 0.356, and 0.350 for the S, T, and Y phosphorylation sites, respectively. Rigorous cross-validation and independent validation tests for the three types of phosphorylation sites demonstrated that the designed DeepPPSite tool significantly outperforms state-of-the-art methods.

Read full abstract

Novel 3D protein descriptors based on bilinear, quadratic and linear algebraic maps in Rn are proposed. The latter employs the kth 2-tuple (dis) similarity matrix to codify information related to covalent and non-covalent interactions in these biopolymers. The calculation of the inter-amino acid distances is generalized by using several dis-similarity coefficients, where normalization procedures based on the simple stochastic and mutual probability schemes are applied. A new local-fragment approach based on amino acid-types and amino acid-groups is proposed to characterize regions of interest in proteins. Topological and geometric macromolecular cutoffs are defined using local and total indices to highlight non-covalent interactions existing between the side-chains of each amino acid. Moreover, local and total indices calculations are generalized considering a LEGO approach, by using several aggregation operators. Collinearity and variability analyses are performed to evaluate every generalizing component applied to the definition of these novel indices. These experiments are oriented to reduce the number of MDs obtained for performing prediction models. The predictive power of the proposed indices was evaluated using two benchmark datasets, folding rate and secondary structural classification of proteins. The proposed MDs are modeled using the following strategies: Multiple Linear Regression (MLR) and Support Vector Machine (SVM), respectively. The best regression model developed for the folding rate of proteins yields a cross-validation coefficient of 0.875 (Test Set) and the best model developed for secondary structural classification obtained 98% of instances correctly classified (Test Set). These statistical parameters are superior to the ones obtained with existing MDs reported in the literature. Overall, the new theoretical generalization enhanced the information extraction into the MDs, allowing a better correlation between these two evaluated benchmark datasets and the proposed indices. The optimal theoretical configurations defined for the calculation of these MDs consider low collinearity and less information redundancy among them. These theoretical configurations and the software are available at http://tomocomd.com/mulims-mcompas.

Read full abstract

Protein Descriptors Research Articles

Related Topics

Articles published on Protein Descriptors

The applications of deep learning algorithms on in silico druggable proteins identification

Evaluation of protein descriptors in computer-aided rational protein engineering tasks and its application in property prediction in SARS-CoV-2 spike glycoprotein

Could network structures generated with simple rules imposed on a cubic lattice reproduce the structural descriptors of globular proteins?

Effect of Various Sequence Descriptors in Predicting Human Proteinprotein Interactions Using ANN-based Prediction Models

Quantative STRUCTURE Activity Relationship (QSAR) Study of a Series of Molecules Derived from 2,3-dihydro-1H- perimidine having Activity against Protein Tyrosine Phosphatase 1B

Quantitative prediction model for affinity of drug\u2013target interactions based on molecular vibrations and overall system of ligand-receptor

Revealing the Mutation Patterns of Drug-Resistant Reverse Transcriptase Variants of Human Immunodeficiency Virus through Proteochemometric Modeling.

Computational Prediction of Compound-Protein Interactions for Orphan Targets Using CGBVS.

Identification of Targeted Proteins by Jamu Formulas for Different Efficacies Using Machine Learning Approach.

Biomedical Data and Deep Learning Computational Models for Predicting Compound-Protein Relations

Implementation protein sequence segmentation in AAC and DC as protein descriptors for improving a classification performance of acetylation prediction

In Silico Prediction of Protein Adsorption Energy on Titanium Dioxide and Gold Nanoparticles.

DeepPPSite: A deep learning-based model for analysis and prediction of phosphorylation sites using efficient sequence information

Protein-Protein Interactions Efficiently Modeled by Residue Cluster Classes.

Deep Dive into Machine Learning Models for Protein Engineering.

Evaluation of deep and shallow learning methods in chemogenomics for the prediction of drugs specificity

Set of Approaches Based on Position Specific Scoring Matrix and Amino Acid Sequence for Primary Category Enzyme Classification

BioMedR: an R/CRAN package for integrated data analysis pipeline in biomedical study.

LEGO-based generalized set of two linear algebraic 3D bio-macro-molecular descriptors: Theory and validation by QSARs

Physicochemical n-Grams Tool: A tool for protein physicochemical descriptor generation via Chou's 5-step rule.

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Protein Descriptors Research Articles

Related Topics

Articles published on Protein Descriptors

The applications of deep learning algorithms on in silico druggable proteins identification

Evaluation of protein descriptors in computer-aided rational protein engineering tasks and its application in property prediction in SARS-CoV-2 spike glycoprotein

Could network structures generated with simple rules imposed on a cubic lattice reproduce the structural descriptors of globular proteins?

Effect of Various Sequence Descriptors in Predicting Human Proteinprotein Interactions Using ANN-based Prediction Models

Quantative STRUCTURE Activity Relationship (QSAR) Study of a Series of Molecules Derived from 2,3-dihydro-1H- perimidine having Activity against Protein Tyrosine Phosphatase 1B

Quantitative prediction model for affinity of drug\u2013target interactions based on molecular vibrations and overall system of ligand-receptor

Revealing the Mutation Patterns of Drug-Resistant Reverse Transcriptase Variants of Human Immunodeficiency Virus through Proteochemometric Modeling.

Computational Prediction of Compound-Protein Interactions for Orphan Targets Using CGBVS.

Identification of Targeted Proteins by Jamu Formulas for Different Efficacies Using Machine Learning Approach.

Biomedical Data and Deep Learning Computational Models for Predicting Compound-Protein Relations

Implementation protein sequence segmentation in AAC and DC as protein descriptors for improving a classification performance of acetylation prediction

In Silico Prediction of Protein Adsorption Energy on Titanium Dioxide and Gold Nanoparticles.

DeepPPSite: A deep learning-based model for analysis and prediction of phosphorylation sites using efficient sequence information

Protein-Protein Interactions Efficiently Modeled by Residue Cluster Classes.

Deep Dive into Machine Learning Models for Protein Engineering.

Evaluation of deep and shallow learning methods in chemogenomics for the prediction of drugs specificity

Set of Approaches Based on Position Specific Scoring Matrix and Amino Acid Sequence for Primary Category Enzyme Classification

BioMedR: an R/CRAN package for integrated data analysis pipeline in biomedical study.

LEGO-based generalized set of two linear algebraic 3D bio-macro-molecular descriptors: Theory and validation by QSARs

Physicochemical n-Grams Tool: A tool for protein physicochemical descriptor generation via Chou's 5-step rule.