Abstract

Ab initio protein folding simulation largely depends on knowledge-based energy functions that are derived from known protein structures using statistical methods. These knowledge-based energy functions provide us with a good approximation of real protein energetics. However, these energy functions are not very informative for search algorithms and fail to distinguish the types of amino acid interactions that contribute largely to the energy function from those that do not. As a result, search algorithms frequently get trapped into the local minima. On the other hand, the hydrophobic–polar (HP) model considers hydrophobic interactions only. The simplified nature of HP energy function makes it limited only to a low-resolution model. In this paper, we present a strategy to derive a non-uniform scaled version of the real 20×20 pairwise energy function. The non-uniform scaling helps tackle the difficulty faced by a real energy function, whereas the integration of 20×20 pairwise information overcomes the limitations faced by the HP energy function. Here, we have applied a derived energy function with a genetic algorithm on discrete lattices. On a standard set of benchmark protein sequences, our approach significantly outperforms the state-of-the-art methods for similar models. Our approach has been able to explore regions of the conformational space which all the previous methods have failed to explore. Effectiveness of the derived energy function is presented by showing qualitative differences and similarities of the sampled structures to the native structures. Number of objective function evaluation in a single run of the algorithm is used as a comparison metric to demonstrate efficiency.

Highlights

  • Worldwide genome projects for large-scale DNA sequencing have determined massive amounts of primary structure data

  • Often the search methods in the reduced models are guided by contact-based energy functions and the conformation, i.e. the amino acid positions are restricted to discretized lattices

  • The first seven protein sequences are taken from the Protein Data Bank (PDB) database and were used in the work of Ullah et al [11]

Read more

Summary

Introduction

Worldwide genome projects for large-scale DNA sequencing have determined massive amounts of primary structure data. The computational protein structure prediction approaches can be broadly classified into three categories: homology modelling, threading or fold recognition and ab initio methods. A large number of physics-based or knowledge-based statistical energy functions [5,6] have been used in protein structure prediction along with different methods and algorithms. More elaborate energy functions that consider 20 × 20 pairwise amino acid interactions are derived by applying statistical methods over X-ray crystallography and NMR data [8,9]. These energy functions are much more informative for the later stages of the search but are less informative for guiding the search to avoid local minima. If a contact having large magnitude of energy can be formed initially by sacrificing some contacts having small magnitude of energy this would increase the degree of freedom for the later search stages when it is possible to try to re-establish the sacrificed contacts

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.