Abstract

Obtaining an accurate description of protein structure is a fundamental step toward understanding the underpinning of biology. Although recent advances in experimental approaches have greatly enhanced our capabilities to experimentally determine protein structures, the gap between the number of protein sequences and known protein structures is ever increasing. Computational protein structure prediction is one of the ways to fill this gap. Recently, the protein structure prediction field has witnessed a lot of advances due to Deep Learning (DL)-based approaches as evidenced by the success of AlphaFold2 in the most recent Critical Assessment of protein Structure Prediction (CASP14). In this article, we highlight important milestones and progresses in the field of protein structure prediction due to DL-based methods as observed in CASP experiments. We describe advances in various steps of protein structure prediction pipeline viz. protein contact map prediction, protein distogram prediction, protein real-valued distance prediction, and Quality Assessment/refinement. We also highlight some end-to-end DL-based approaches for protein structure prediction approaches. Additionally, as there have been some recent DL-based advances in protein structure determination using Cryo-Electron (Cryo-EM) microscopy based, we also highlight some of the important progress in the field. Finally, we provide an outlook and possible future research directions for DL-based approaches in the protein structure prediction arena.

Highlights

  • Three sets of coevolutionary features viz. covariance features (COV), precision matrix features (PRE), and a coupling parameter matrix approximated by pseudolikelihood maximization (PLM) are extracted from the deep multiple sequence alignments (MSAs) created in step 1— the name TripletRes

  • As in previous iterations of I-TASSER, the C-ITASSER pipeline consists of the following steps. (a) Given a protein sequence, the sequence is threaded using LOMETS, and at the same time, MSA is generated using DeepMSA. (b) Template fragments are created from the threading templates, which are subjected to structure assembly using Replica-Exchange Monte Carlo (REMC) guided by the potential calculated from the improved contact map created using NeBcon

  • We are at an exciting era in terms of protein structure prediction approaches especially due to the advancement in the field made possible by using Deep Learning

Read more

Summary

Introduction

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. After the CASP13 competition, it was evident that evolutionary information captured in multiple sequence alignments (MSAs) was the most important input for structure prediction, and all research groups have their own in-house methods for generating MSAs. the problem of generating high-quality alignments, for difficult sequences, stands as a huge challenge. There is no comprehensive review that focuses on the DL-based advances in various steps of the protein structure prediction pipeline. We highlight DL-based advances in each step of the protein structure prediction pipeline viz. Advances in MSA generation, contact map prediction, protein residue–distance prediction, potentials to guide iterative fragment assembly, models, or quality assessment (QA), advances in overall protein prediction pipelines, and advances in Cryo-EM based protein structure determination and the future outlook for the protein structure prediction field We highlight DL-based advances in each step of the protein structure prediction pipeline viz. advances in MSA generation, contact map prediction, protein residue–distance prediction, potentials to guide iterative fragment assembly, models, or quality assessment (QA), advances in overall protein prediction pipelines, and advances in Cryo-EM based protein structure determination and the future outlook for the protein structure prediction field

Deep Learning-Based Advances in Various Steps of Protein Structure
Advances in Approaches for Multiple Sequence Alignment
DL-Based Advances in Protein Contact Map Prediction
RaptorX-Contact
ResPre
MapPred
DEEPCON
DeepECA
ContactGAN
InterPretContactMap
TripletRes
Summary of Advances in DL-Based Approaches for Protein Contact Map Prediction
Deep Learning-Based Advances in ‘Distogram Prediction’
Distogram
ProSPr
Distogram Prediction in trRosetta
AttentiveDist
Summary of Deep Learning-Based Advances in ‘Distogram Prediction’
Deep Learning-Based Advances in ‘Real-Valued Distance Prediction’
GAN-Based Real-Valued Distance Prediction Method
Xu’s Real-Valued Distance Prediction Method
RealDist
DeepDist
DISTEVAL
Summary of Deep Learning-Based Advances in ‘Real-Valued Distance Prediction’
ResNetQA
MULTICOM EMA Predictors
DeepAccNEt
Summary
AlphaFold
RaptorX
MULTICOM
AlQuraishi’s Recurrent Geometric Network
AlphaFold2
Advances in Deep Learning-Based Approaches for Cryo-EM Protein
Deep Learning Approaches for Single Particle Picking
CASSPER
MicroGraphCleaner
AutoCryoPicker
Deep Learning-Based Approaches for Prediction of Backbone in Cryo-EM
Deep Learning Approaches for All-Atom Structure of a PROTEIN COMPLEX
Deep Learning-Based Approach for Protein Dynamics Information from Cryo-EM
EMRefiner
SuperEm
Future Outlook and Conclusions
Better Deep Learning-Based Algorithms for MSA Generation
Transformer Based Open-Source Approaches for Protein Structure Prediction
Findings
Explainable AI Approaches
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call