Abstract
BackgroundAutomatic protein modelling pipelines are becoming ever more accurate; this has come hand in hand with an increasingly complicated interplay between all components involved. Nevertheless, there are still potential improvements to be made in template selection, refinement and protein model selection.ResultsIn the context of an automatic modelling pipeline, we analysed each step separately, revealing several non-intuitive trends and explored a new strategy for protein conformation sampling using Genetic Algorithms (GA). We apply the concept of alternating evolutionary pressure (AEP), i.e. intermediate rounds within the GA runs where unrestrained, linear growth of the model populations is allowed.ConclusionThis approach improves the overall performance of the GA by allowing models to overcome local energy barriers. AEP enabled the selection of the best models in 40% of all targets; compared to 25% for a normal GA.
Highlights
Automatic protein modelling pipelines are becoming ever more accurate; this has come hand in hand with an increasingly complicated interplay between all components involved
We considered all sequences from the seventh round of CASP [43] which were downloaded from the Protein Structure Prediction Center webpage
For this study we modelled 75 diverse protein sequences from the CASP7 dataset of the category template-based modelling
Summary
Automatic protein modelling pipelines are becoming ever more accurate; this has come hand in hand with an increasingly complicated interplay between all components involved. There are still potential improvements to be made in template selection, refinement and protein model selection. Impressive progress in protein structure modelling has been achieved over the last decade; improvement between subsequent rounds of the Critical Assessment of Techniques for Protein Structure Prediction (CASP) is often considered to be modest [1,2]. Protein models are useful for qualitative analysis and decision-making in support of a wide range of experimental work. Modelling techniques are still not accurate enough to close the gap between known protein sequences (approximately 5 million non redundant) and solved protein structures (approximately 50,000). The gap between the quality of fully automated and manual modelling techniques has narrowed and second, improvement beyond the best template is achieved more frequently
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.