Abstract
Ab initio protein folding simulation largely depends on knowledge-based energy functions that are derived from known protein structures using statistical methods. These knowledge-based energy functions provide us with a good approximation of real protein energetics. However, these energy functions are not very informative for search algorithms and fail to distinguish the types of amino acid interactions that contribute largely to the energy function from those that do not. As a result, search algorithms frequently get trapped into the local minima. On the other hand, the hydrophobic–polar (HP) model considers hydrophobic interactions only. The simplified nature of HP energy function makes it limited only to a low-resolution model. In this paper, we present a strategy to derive a non-uniform scaled version of the real 20×20 pairwise energy function. The non-uniform scaling helps tackle the difficulty faced by a real energy function, whereas the integration of 20×20 pairwise information overcomes the limitations faced by the HP energy function. Here, we have applied a derived energy function with a genetic algorithm on discrete lattices. On a standard set of benchmark protein sequences, our approach significantly outperforms the state-of-the-art methods for similar models. Our approach has been able to explore regions of the conformational space which all the previous methods have failed to explore. Effectiveness of the derived energy function is presented by showing qualitative differences and similarities of the sampled structures to the native structures. Number of objective function evaluation in a single run of the algorithm is used as a comparison metric to demonstrate efficiency.
Highlights
Worldwide genome projects for large-scale DNA sequencing have determined massive amounts of primary structure data
Often the search methods in the reduced models are guided by contact-based energy functions and the conformation, i.e. the amino acid positions are restricted to discretized lattices
The first seven protein sequences are taken from the Protein Data Bank (PDB) database and were used in the work of Ullah et al [11]
Summary
Worldwide genome projects for large-scale DNA sequencing have determined massive amounts of primary structure data. The computational protein structure prediction approaches can be broadly classified into three categories: homology modelling, threading or fold recognition and ab initio methods. A large number of physics-based or knowledge-based statistical energy functions [5,6] have been used in protein structure prediction along with different methods and algorithms. More elaborate energy functions that consider 20 × 20 pairwise amino acid interactions are derived by applying statistical methods over X-ray crystallography and NMR data [8,9]. These energy functions are much more informative for the later stages of the search but are less informative for guiding the search to avoid local minima. If a contact having large magnitude of energy can be formed initially by sacrificing some contacts having small magnitude of energy this would increase the degree of freedom for the later search stages when it is possible to try to re-establish the sacrificed contacts
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have