Abstract

The interactions of proteins with DNA, RNA, peptide, and carbohydrate play key roles in various biological processes. The studies of uncharacterized protein-molecules interactions could be aided by accurate predictions of residues that bind with partner molecules. However, the existing methods for predicting binding residues on proteins remain of relatively low accuracies due to the limited number of complex structures in databases. As different types of molecules partially share chemical mechanisms, the predictions for each molecular type should benefit from the binding information with other molecule types. In this study, we employed a multiple task deep learning strategy to develop a new sequence-based method for simultaneously predicting binding residues/sites with multiple important molecule types named MTDsite. By combining four training sets for DNA, RNA, peptide, and carbohydrate-binding proteins, our method yielded accurate and robust predictions with AUC values of 0.852, 0836, 0.758, and 0.776 on their respective independent test sets, which are 0.52 to 6.6% better than other state-of-the-art methods. To my best knowledge, this is the first method using multi-task framework to predict multiple molecular binding sites simultaneously.

Highlights

  • Predicting proteins interactions with other molecules is critical for understanding biological processes and discovering drugs

  • The peptide dataset: The dataset was downloaded from a recent study (Taherzadeh G, et al, 2016), where protein–peptide complex structures were extracted from the BioLip protein–ligands database (Yang, et al, 2013) with peptides as the ligands derived from the Protein Data Bank (PDB)

  • The structural properties were predicted by SPIDER 3.0 (Rhys Heffernan, 2018), including: ASA (2): The accessible surface area (ASA) means the surface area of a biomolecule accessible to a solvent, which reflects the functional importance of residues

Read more

Summary

Results

We evaluated the contributions of individual feature group by using only single feature group or excluding one feature group from all features. When individual feature group was used in the prediction, G-PSSM, the evolution features produced by PSIBLAST, yielded the greatest values in regard with the average values of both AUC and MCC. G-HHM, another feature group produced by HHblits, yielded slightly lower AUC and MCC values. G-SPD3, the structural feature group produced by SPIDER3 package, yielded significantly lower AUC values in average. These results suggest the importance of evolution information for protein binding, consistent with previous findings (Hong Su, et al, 2018). When excluding individual feature group, the removal of G-PSSM caused the largest decreases in the average values of both AUC and MCC, again indicating its most important role. The decreases are small likely because the G_SPD3 features were derived from the PSSM and HHM profiles, and our neural networks could partly catch the structural information from the two profiles

Introduction
Benchmark Datasets
Input Features
Performance evaluation
MTDsite architecture
Cross-Validation and Independent Test
Model selection and Parameters optimization
MTDsite-single Models
Contributions by the shared networks
Comparisons with other methods
Case study
Conclusion and Discussion
Method
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.