Abstract

Sequence-based contact prediction has shown considerable promise in assisting non-homologous structure modeling, but it often requires many homologous sequences and a sufficient number of correct contacts to achieve correct folds. Here, we developed a method, C-QUARK, that integrates multiple deep-learning and coevolution-based contact-maps to guide the replica-exchange Monte Carlo fragment assembly simulations. The method was tested on 247 non-redundant proteins, where C-QUARK could fold 75% of the cases with TM-scores (template-modeling scores) ≥0.5, which was 2.6 times more than that achieved by QUARK. For the 59 cases that had either low contact accuracy or few homologous sequences, C-QUARK correctly folded 6 times more proteins than other contact-based folding methods. C-QUARK was also tested on 64 free-modeling targets from the 13th CASP (critical assessment of protein structure prediction) experiment and had an average GDT_TS (global distance test) score that was 5% higher than the best CASP predictors. These data demonstrate, in a robust manner, the progress in modeling non-homologous protein structures using low-accuracy and sparse contact-map predictions.

Highlights

  • Sequence-based contact prediction has shown considerable promise in assisting nonhomologous structure modeling, but it often requires many homologous sequences and a sufficient number of correct contacts to achieve correct folds

  • Built on one of the top ab initio protein folding simulation programs, QUARK5,30, C-QUARK starts with multiple sequence alignment (MSA) collection from wholegenome and metagenome sequence databases[32], where two types of contact-maps are created by deep-learning[29,33,34,35,36] and coevolution[26,37,38,39,40] based predictors

  • Structural fragments with continuous sequence lengths (1-20 AA) are collected from unrelated PDB structures and used to assemble full-length structure models by Replica-Exchange Monte Carlo (REMC) simulations under the guidance of a composite force field consisting of knowledge-based energy terms, inter-residue contacts collected from the structure fragments based on their distance profiles[30], and the sequence-based contact-map predictions (Fig. 1)

Read more

Summary

Introduction

Sequence-based contact prediction has shown considerable promise in assisting nonhomologous structure modeling, but it often requires many homologous sequences and a sufficient number of correct contacts to achieve correct folds. C-QUARK was tested on 64 free-modeling targets from the 13th CASP (critical assessment of protein structure prediction) experiment and had an average GDT_TS (global distance test) score that was 5% higher than the best CASP predictors These data demonstrate, in a robust manner, the progress in modeling non-homologous protein structures using low-accuracy and sparse contact-map predictions. When the number of homologous sequences and the accuracy of sequence-based contact prediction is low, how to balance the noisy contact-maps with the advanced folding simulation force fields to construct correct ab initio structure folds remains an important and challenging problem. The results demonstrate, in a robust manner, the critical importance of a balanced combination of multiple complementary contact restraints with an advanced knowledge-based force field for improving the accuracy of ab initio protein structure prediction, especially for targets with complicated folding topologies. Given the special role of contact-map prediction in protein folding and the fact that most of the predicted distances and orientations are on residue pairs within short distances of each other (i.e., in contact), we believe it is of critical importance to study and benchmark separately the impact of contact-maps on the problem of ab initio protein structure prediction, and systematically examine the critical weaknesses and strengths of deep-learning contact restraints when coupled with advanced protein folding simulation algorithms

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call