Abstract

The affinity of different drug-like ligands to multiple protein targets reflects general chemical–biological interactions. Computational methods estimating such interactions analyze the available information about the structure of the targets, ligands, or both. Prediction of protein–ligand interactions based on pairwise sequence alignment provides reasonable accuracy if the ligands’ specificity well coincides with the phylogenic taxonomy of the proteins. Methods using multiple alignment require an accurate match of functionally significant residues. Such conditions may not be met in the case of diverged protein families. To overcome these limitations, we propose an approach based on the analysis of local sequence similarity within the set of analyzed proteins. The positional scores, calculated by sequence fragment comparisons, are used as input data for the Bayesian classifier. Our approach provides a prediction accuracy comparable or exceeding those of other methods. It was demonstrated on the popular Gold Standard test sets, presenting different sequence heterogeneity and varying from the group, including different protein families to the more specific groups. A reasonable prediction accuracy was also found for protein kinases, displaying weak relationships between sequence phylogeny and inhibitor specificity. Thus, our method can be applied to the broad area of protein–ligand interactions.

Highlights

  • Recognition of specific interactions of biological macromolecules is necessary for the study of regulatory processes in biological systems

  • The identified protein ligands can be applied for the study of signaling and metabolic pathways

  • To describe the protein under study, each amino acid residue was estimated by the score calculated in terms of its surrounding in a sequence

Read more

Summary

Introduction

Recognition of specific interactions of biological macromolecules is necessary for the study of regulatory processes in biological systems. (Q)SAR (Quantitative Structure-Activity Relationship) methods build models, in which the protein identifiers are class-forming features without considering protein features [3,4] In this case, the prediction of a new target (not covered by the training data) is impossible. Several tools predicting the protein targets for ligands use the protein similarity matrices obtained for entire protein sequences by pairwise alignment [8,12,16,19,21,23,25] Such a method can recognize targets if the ligand specificity well correlates with overall sequence proximity, detected by phylogenic studies. Each position of the query sequence gets the score These values are input data to the classifier, which estimates the protein specificity to the ligands. We demonstrated that the suggested approach is applicable for protein data with a significant degree of heterogeneity, unlike the many existing methods often fitted to specific studied areas [38,39]

Evaluation on Gold Standard and PASS Targets Datasets
Protein Kinases
Training Sets
Positional Similarity Scores
Prediction Algorithm
Computation Time
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call