Abstract

Since Anfinsen demonstrated that the information encoded in a protein’s amino acid sequence determines its structure in 1973, solving the protein structure prediction problem has been the Holy Grail of structural biology. The goal of protein structure prediction approaches is to utilize computational modeling to determine the spatial location of every atom in a protein molecule starting from only its amino acid sequence. Depending on whether homologous structures can be found in the Protein Data Bank (PDB), structure prediction methods have been historically categorized as template-based modeling (TBM) or template-free modeling (FM) approaches. Until recently, TBM has been the most reliable approach to predicting protein structures, and in the absence of reliable templates, the modeling accuracy sharply declines. Nevertheless, the results of the most recent community-wide assessment of protein structure prediction experiment (CASP14) have demonstrated that the protein structure prediction problem can be largely solved through the use of end-to-end deep machine learning techniques, where correct folds could be built for nearly all single-domain proteins without using the PDB templates. Critically, the model quality exhibited little correlation with the quality of available template structures, as well as the number of sequence homologs detected for a given target protein. Thus, the implementation of deep-learning techniques has essentially broken through the 50-year-old modeling border between TBM and FM approaches and has made the success of high-resolution structure prediction significantly less dependent on template availability in the PDB library.

Highlights

  • The gap between template-based modeling (TBM) and template-free modeling (FM) accuracies has been largely reduced and most of the structure prediction studies have focused on distant-homology modeling, in which close homologous templates must be excluded to facilitate benchmark testing and comparisons with other methods, both traditional TBM/FM and modern deep learning methods rely essentially on the experimentally solved structures and are impacted by the increase in the number of structures in the Protein Data Bank (PDB)

  • The prediction of protein structures starting from amino acid sequences has remained an outstanding problem in structural biology since Anfisen first demonstrated that the information encoded in a protein sequence determines its structure

  • The most reliable approach for solving the protein structure prediction problem has been to identify and refine the structural frameworks of templates detected from the PDB

Read more

Summary

Toward the solution of the protein structure prediction problem

Robin Pearce and Yang Zhang1,2,* From the 1Department of Computational Medicine and Bioinformatics, 2Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan, USA

Edited by Wolfgang Peti
An overview of the history of protein structure prediction
Pairwise spatial restraint prediction
Contact map prediction using shallow machine learning approaches
Contact map prediction using deep neural networks
Distance map prediction using deep learning
Incorporating metagenomic sequence data into prediction approaches
Unsupervised contact map prediction using transformers
Impact of deep learning on structure modeling accuracy
First Place Human Group Second Place Human
Improving contact prediction accuracy using deep learning
Improving tertiary structure modeling using deep learning
Findings
Conclusion and future directions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call