Abstract

Recently ab initio protein folding using predicted contacts as restraints has made some progress, but it requires accurate contact prediction, which by existing methods can only be achieved on some large-sized protein families with thousands of sequence homologs. To improve contact prediction for small-sized protein families, we employ the emerging deep learning technique from Computer Science, a powerful technique that can learn complex patterns from large datasets and has revolutionized object and speech recognition, machine translation and the GO game. Our deep learning model for contact prediction is formed by two deep residual neural networks. The first one learns relationship between contacts and sequential features (residue conservation and predicted secondary structure) from thousands of protein families, while the second one learns the occurring patterns of contacts and their relationship with pairwise features such as contact potential, residue co-evolution strength and the output of the first network. Experimental results suggest that our deep learning method greatly improves contact prediction and contact-assisted folding, especially for small-sized protein families. Tested on 579 proteins dissimilar to training proteins, the average top L (L is sequence length) long-range prediction accuracy of our method, the representative direct evolutionary coupling method CCMpred and the CASP11 winner MetaPSICOV is 0.47, 0.21 and 0.30, respectively; their average top L/10 long-range accuracy is 0.77, 0.47 and 0.59, respectively. Even without using force fields, our predicted contacts allow us to correctly fold 203 test proteins, while MetaPSICOV and CCMpred contacts can do only 79 and 62 proteins, respectively. In the three weeks of blind test with the weekly benchmark CAMEO (http://www.cameo3d.org/), our method successfully folded three large hard targets with a new fold and only 1.3L-2.3L sequence homologs while all template-based methods failed.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call