Protein model accuracy estimation empowered by deep learning and inter-residue distance prediction in CASP14

Xiao Chen,Tianqi Wu,Zhiye Guo,Jie Hou,Jian Liu,Jianlin Cheng

doi:10.1038/s41598-021-90303-6

Xiao Chen, Tianqi Wu + Show 4 more

Open Access

https://doi.org/10.1038/s41598-021-90303-6

Copy DOI

Journal: Scientific Reports	Publication Date: May 25, 2021
Citations: 10	License type: open-access

Affiliation: University of Missouri, Saint Louis University

Abstract

The inter-residue contact prediction and deep learning showed the promise to improve the estimation of protein model accuracy (EMA) in the 13th Critical Assessment of Protein Structure Prediction (CASP13). To further leverage the improved inter-residue distance predictions to enhance EMA, during the 2020 CASP14 experiment, we integrated several new inter-residue distance features with the existing model quality assessment features in several deep learning methods to predict the quality of protein structural models. According to the evaluation of performance in selecting the best model from the models of CASP14 targets, our three multi-model predictors of estimating model accuracy (MULTICOM-CONSTRUCT, MULTICOM-AI, and MULTICOM-CLUSTER) achieve the averaged loss of 0.073, 0.079, and 0.081, respectively, in terms of the global distance test score (GDT-TS). The three methods are ranked first, second, and third out of all 68 CASP14 predictors. MULTICOM-DEEP, the single-model predictor of estimating model accuracy (EMA), is ranked within top 10 among all the single-model EMA methods according to GDT-TS score loss. The results demonstrate that inter-residue distance features are valuable inputs for deep learning to predict the quality of protein structural models. However, larger training datasets and better ways of leveraging inter-residue distance information are needed to fully explore its potentials.

Highlights

The inter-residue contact prediction and deep learning showed the promise to improve the estimation of protein model accuracy (EMA) in the 13th Critical Assessment of Protein Structure Prediction (CASP13)
The domains are classified into three categories: (1) templatebased modeling (TBM) domains—the regular domains that have known structure templates in the Protein Data Bank (PDB)[43] (TBM domains are further classified into TBM-easy and TBM-hard categories according to the difficulty of predicting their tertiary structures); (2) free modeling (FM) domains—the very hard domains that do not have any known structure templates in the PDB; and (3) something between the two (FM/TBM), which may have some very weak templates that cannot be recognized by existing template-identification methods
The results show the global distance test score (GDT-TS) loss is lower on easier targets than harder targets for all the MULTICOM EMA predictors generally, indicating that it is still easier to rank the models of easy targets than hard targets

Summary

Introduction

The inter-residue contact prediction and deep learning showed the promise to improve the estimation of protein model accuracy (EMA) in the 13th Critical Assessment of Protein Structure Prediction (CASP13). To further leverage the improved inter-residue distance predictions to enhance EMA, during the 2020 CASP14 experiment, we integrated several new inter-residue distance features with the existing model quality assessment features in several deep learning methods to predict the quality of protein structural models. In the 13th Critical Assessment of Protein Structure Prediction (CASP13), the inter-residue contact information and deep learning were the key for DeepRank[17] to achieve the best performance in ranking protein structural models with the minimum loss of GDT-TS s core[18]. To investigate how residue-residue distance/contact features may improve protein model quality assessment with deep learning, we developed several EMA predictors to evaluate different ways of using contact and distance predictions as features in the 2020 CASP14 experiment. All the methods predict a normalized GDT-TS score for a model of a target using deep learning, which estimates the quality of the model in the range from 0 (worst) to 1 (best)

Methods

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Protein model accuracy estimation empowered by deep learning and inter-residue distance prediction in CASP14

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Scientific Reports

Lead the way for us

Similar Papers

DISTEMA: distance map-based estimation of single protein model accuracy with attentive 2D convolutional neural network
Xiao Chen ... Jianlin Cheng
BMC Bioinformatics | VOL. 23
Xiao Chen, et. al.Xiao Chen ... Jianlin Cheng
01 Mar 2022
BMC Bioinformatics | VOL. 23

Enhancing protein inter-residue real distance prediction by scrutinising deep learning models
Julia Rahman ... Abdul Sattar
Scientific Reports | VOL. 12
Julia Rahman, et. al.Julia Rahman ... Abdul Sattar
17 Jan 2022
Scientific Reports | VOL. 12

Protein Tertiary Structure Modeling Driven by Deep Learning and Contact Distance Prediction in CASP13
Jianlin Cheng
-
Jianlin ChengJianlin Cheng
04 Sep 2019
04 Sep 2019

Segment assembly, structure alignment and iterative simulation in protein structure prediction
Yang Zhang ... Jeffrey Skolnick
BMC Biology | VOL. 11
Yang Zhang, et. al.Yang Zhang ... Jeffrey Skolnick
15 Apr 2013
BMC Biology | VOL. 11

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Protein model accuracy estimation empowered by deep learning and inter-residue distance prediction in CASP14

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Scientific Reports