Protein Descriptors Research Articles

The modulation of protein-protein interactions (PPIs) by small chemical compounds is challenging. PPIs play a critical role in most cellular processes and are involved in numerous disease pathways. As such, novel strategies that assist the design of PPI inhibitors are of major importance. We previously reported that the knowledge-based DLIGAND2 scoring tool was the best-rescoring function for improving receptor-based virtual screening (VS) performed with the Surflex docking engine applied to several PPI targets with experimentally known active and inactive compounds. Here, we extend our investigation by assessing the vs. potential of other types of scoring functions with an emphasis on docking-pose derived solvent accessible surface area (SASA) descriptors, with or without the use of machine learning (ML) classifiers. First, we explored rescoring strategies of Surflex-generated docking poses with five GOLD scoring functions (GoldScore, ChemScore, ASP, ChemPLP, ChemScore with Receptor Depth Scaling) and with consensus scoring. The top-ranked poses were post-processed to derive a set of protein and ligand SASA descriptors in the bound and unbound states, which were combined to derive descriptors of the docked protein-ligand complexes. Further, eight ML models (tree, bagged forest, random forest, Bayesian, support vector machine, logistic regression, neural network, and neural network with bagging) were trained using the derivatized SASA descriptors and validated on test sets. The results show that many SASA descriptors are better than Surflex and GOLD scoring functions in terms of overall performance and early recovery success on the used dataset. The ML models were superior to all scoring functions and rescoring approaches for most targets yielding up to a seven-fold increase in enrichment factors at 1% of the screened collections. In particular, the neural networks and random forest-based ML emerged as the best techniques for this PPI dataset, making them robust and attractive vs. tools for hit-finding efforts. The presented results suggest that exploring further docking-pose derived SASA descriptors could be valuable for structure-based virtual screening projects, and in the present case, to assist the rational design of small-molecule PPI inhibitors.

Read full abstract

This study introduces a set of fuzzy spherically truncated three-dimensional (3D) multi-linear descriptors for proteins. These indices codify geometric structural information from kth spherically truncated spatial-(dis)similarity two-tuple and three-tuple tensors. The coefficients of these truncated tensors are calculated by applying a smoothing value to the 3D structural encoding based on the relationships between two and three amino acids of a protein embedded into a sphere. At considering, the geometrical center of the protein matches with center of the sphere, the distance between each amino acid involved in any specific interaction and the geometrical center of the protein can be computed. Then, the fuzzy membership degree of each amino acid from an spherical region of interest is computed by fuzzy membership functions (FMFs). The truncation value is finally a combination of the membership degrees from interacting amino acids, by applying the arithmetic mean as fusion rule. Several fuzzy membership functions with diverse biases on the calculation of amino acids memberships (e.g., Z-shaped (close to the center), PI-shaped (middle region), and A-Gaussian (far from the center)) were considered as well as traditional truncation functions (e.g., Switching). Such truncation functions were comparatively evaluated by exploring: 1) the frequency of membership degrees, 2) the variability and orthogonality analyses among them based on the Shannon Entropy’s and Principal Component’s methods, respectively, and 3) the prediction performance of alignment-free prediction of protein folding rates and structural classes. These analyses unraveled the singularity of the proposed fuzzy spherically truncated MDs with respect to the classical (non-truncated) ones and respect to the MDs truncated with traditional functions. They also showed an improved prediction power by attaining an external correlation coefficient of 95.82% in the folding rate modelling and an accuracy of 100% in distinguishing structural protein classes. These outcomes are better than the ones attained by existing approaches, justifying the theoretical contribution of this report. Thus, the fuzzy spherically truncated-based protein descriptors from MuLiMs-MCoMPAs (http://tomocomd.com/mulims-mcompas) are promising alignment-free predictors for modeling protein functions and properties.

Read full abstract

Protein Descriptors Research Articles

Related Topics

Articles published on Protein Descriptors

1D and 2D Feature Extraction Based on AAC and DC Protein Descriptors for Classification of Acetylation in Lysine Proteins using Convolutional Neural Network

Descriptor-augmented machine learning for enzyme-chemical interaction predictions

DescribePROT in 2023: more, higher-quality and experimental annotations and improved data download options.

Fuse feeds as one: cross-modal framework for general identification of AMPs.

3DDPDs: describing protein dynamics for proteochemometric bioactivity prediction. A case for (mutant) G protein-coupled receptors

Protein-Ligand Binding Affinity Prediction Exploiting Sequence Constituent Homology.

A study of the interaction space of two lactate dehydrogenase isoforms (LDHA and LDHB) and some of their inhibitors using proteochemometrics modeling

A Hybrid Docking and Machine Learning Approach to Enhance the Performance of Virtual Screening Carried out on Protein-Protein Interfaces.

Predicting unseen antibodies’ neutralizability via adaptive graph neural networks

ICAN: Interpretable cross-attention network for identifying drug and target protein interactions.

A multiscale modeling methodfor therapeutic antibodies in ion exchange chromatography.

Statistical proofs of the interdependence between nearest neighbor effects on polypeptide backbone conformations

Handcrafted versus non-handcrafted (self-supervised) features for the classification of antimicrobial peptides: complementary or redundant?

Fuzzy spherical truncation-based multi-linear protein descriptors: From their definition to application in structural-related predictions.

DRDB: A Machine Learning Platform to Predict Chemical-Protein Interactions towards Diabetic Retinopathy.

On the Frustration to Predict Binding Affinities from Protein-Ligand Structures with Deep Neural Networks.

Proteochemometrics modeling for prediction of the interactions between caspase isoforms and their inhibitors.

Machine learning based predictive model for the analysis of sequence activity relationships using protein spectra and protein descriptors.

In-silico screening of potential target transporters for glycyrrhetinic acid (GA) via deep learning prediction of drug-target interactions

Deep Learning Model for Protein Disease Classification

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Protein Descriptors Research Articles

Related Topics

Articles published on Protein Descriptors

1D and 2D Feature Extraction Based on AAC and DC Protein Descriptors for Classification of Acetylation in Lysine Proteins using Convolutional Neural Network

Descriptor-augmented machine learning for enzyme-chemical interaction predictions

DescribePROT in 2023: more, higher-quality and experimental annotations and improved data download options.

Fuse feeds as one: cross-modal framework for general identification of AMPs.

3DDPDs: describing protein dynamics for proteochemometric bioactivity prediction. A case for (mutant) G protein-coupled receptors

Protein-Ligand Binding Affinity Prediction Exploiting Sequence Constituent Homology.

A study of the interaction space of two lactate dehydrogenase isoforms (LDHA and LDHB) and some of their inhibitors using proteochemometrics modeling

A Hybrid Docking and Machine Learning Approach to Enhance the Performance of Virtual Screening Carried out on Protein-Protein Interfaces.

Predicting unseen antibodies’ neutralizability via adaptive graph neural networks

ICAN: Interpretable cross-attention network for identifying drug and target protein interactions.

A multiscale modeling methodfor therapeutic antibodies in ion exchange chromatography.

Statistical proofs of the interdependence between nearest neighbor effects on polypeptide backbone conformations

Handcrafted versus non-handcrafted (self-supervised) features for the classification of antimicrobial peptides: complementary or redundant?

Fuzzy spherical truncation-based multi-linear protein descriptors: From their definition to application in structural-related predictions.

DRDB: A Machine Learning Platform to Predict Chemical-Protein Interactions towards Diabetic Retinopathy.

On the Frustration to Predict Binding Affinities from Protein-Ligand Structures with Deep Neural Networks.

Proteochemometrics modeling for prediction of the interactions between caspase isoforms and their inhibitors.

Machine learning based predictive model for the analysis of sequence activity relationships using protein spectra and protein descriptors.

In-silico screening of potential target transporters for glycyrrhetinic acid (GA) via deep learning prediction of drug-target interactions

Deep Learning Model for Protein Disease Classification