Abstract
Since Anfinsen demonstrated that the information encoded in a protein’s amino acid sequence determines its structure in 1973, solving the protein structure prediction problem has been the Holy Grail of structural biology. The goal of protein structure prediction approaches is to utilize computational modeling to determine the spatial location of every atom in a protein molecule starting from only its amino acid sequence. Depending on whether homologous structures can be found in the Protein Data Bank (PDB), structure prediction methods have been historically categorized as template-based modeling (TBM) or template-free modeling (FM) approaches. Until recently, TBM has been the most reliable approach to predicting protein structures, and in the absence of reliable templates, the modeling accuracy sharply declines. Nevertheless, the results of the most recent community-wide assessment of protein structure prediction experiment (CASP14) have demonstrated that the protein structure prediction problem can be largely solved through the use of end-to-end deep machine learning techniques, where correct folds could be built for nearly all single-domain proteins without using the PDB templates. Critically, the model quality exhibited little correlation with the quality of available template structures, as well as the number of sequence homologs detected for a given target protein. Thus, the implementation of deep-learning techniques has essentially broken through the 50-year-old modeling border between TBM and FM approaches and has made the success of high-resolution structure prediction significantly less dependent on template availability in the PDB library.
Highlights
The gap between template-based modeling (TBM) and template-free modeling (FM) accuracies has been largely reduced and most of the structure prediction studies have focused on distant-homology modeling, in which close homologous templates must be excluded to facilitate benchmark testing and comparisons with other methods, both traditional TBM/FM and modern deep learning methods rely essentially on the experimentally solved structures and are impacted by the increase in the number of structures in the Protein Data Bank (PDB)
The prediction of protein structures starting from amino acid sequences has remained an outstanding problem in structural biology since Anfisen first demonstrated that the information encoded in a protein sequence determines its structure
The most reliable approach for solving the protein structure prediction problem has been to identify and refine the structural frameworks of templates detected from the PDB
Summary
Robin Pearce and Yang Zhang1,2,* From the 1Department of Computational Medicine and Bioinformatics, 2Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan, USA
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.