Abstract

The routine prediction of three-dimensional protein structure from sequence remains a challenge in computational biochemistry. It has been intuited that calculated energies from physics-based scoring functions are able to distinguish native from nonnative folds based on previous performance with small proteins and that conformational sampling is the fundamental bottleneck to successful folding. We demonstrate that as protein size increases, errors in the computed energies become a significant problem. We show, by using error probability density functions, that physics-based scores contain significant systematic and random errors relative to accurate reference energies. These errors propagate throughout an entire protein and distort its energy landscape to such an extent that modern scoring functions should have little chance of success in finding the free energy minima of large proteins. Nonetheless, by understanding errors in physics-based score functions, they can be reduced in a post-hoc manner, improving accuracy in energy computation and fold discrimination.

Highlights

  • A widely studied and yet largely unsolved problem in computational biochemistry is the ab initio protein-folding problem – the prediction of three-dimensional protein structure from an amino acid sequence [1,2]

  • Energies were evaluated with the ff99sb force field [33], the Generalized Amber Force Field (GAFF) [35], ff03 [36], AM1 [37], PM3 [38], PM6 [39], PDDG [40], PM6-DH2 [41], HF, MP2, B97-D [42], M06, and M06-L [43]

  • Folded proteins are characterized by numerous van der Waals and hydrogen bonding interactions that need to be accurately accounted for when using physics-based score functions

Read more

Summary

Introduction

A widely studied and yet largely unsolved problem in computational biochemistry is the ab initio protein-folding problem – the prediction of three-dimensional protein structure from an amino acid sequence [1,2]. The basis of any physics-based method used to study protein folding is the thermodynamic hypothesis - that the biologically active (native) fold is a free energy minimum [3]. This is the most widely used paradigm, there are a few known exceptions to the rule [4,5]. Monte Carlo-based search and minimization techniques in conjunction with physics-based potentials are employed [10] These and other physics-based methods have had difficulty in correctly predicting protein folds of chains longer than 100 amino acids [11,12]

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.