MULTICOM2 open-source protein structure prediction system powered by deep learning and distance prediction

Tianqi Wu,Zhiye Guo,Jian Liu,Jianlin Cheng,Jie Hou

doi:10.1038/s41598-021-92395-6

Tianqi Wu, Zhiye Guo + Show 3 more

Open Access

https://doi.org/10.1038/s41598-021-92395-6

Copy DOI

Journal: Scientific Reports	Publication Date: Jun 23, 2021
Citations: 1	License type: open-access

Affiliation: University of Missouri, Saint Louis University

Abstract

Protein structure prediction is an important problem in bioinformatics and has been studied for decades. However, there are still few open-source comprehensive protein structure prediction packages publicly available in the field. In this paper, we present our latest open-source protein tertiary structure prediction system—MULTICOM2, an integration of template-based modeling (TBM) and template-free modeling (FM) methods. The template-based modeling uses sequence alignment tools with deep multiple sequence alignments to search for structural templates, which are much faster and more accurate than MULTICOM1. The template-free (ab initio or de novo) modeling uses the inter-residue distances predicted by DeepDist to reconstruct tertiary structure models without using any known structure as template. In the blind CASP14 experiment, the average TM-score of the models predicted by our server predictor based on the MULTICOM2 system is 0.720 for 58 TBM (regular) domains and 0.514 for 38 FM and FM/TBM (hard) domains, indicating that MULTICOM2 is capable of predicting good tertiary structures across the board. It can predict the correct fold for 76 CASP14 domains (95% regular domains and 55% hard domains) if only one prediction is made for a domain. The success rate is increased to 3% for both regular and hard domains if five predictions are made per domain. Moreover, the prediction accuracy of the pure template-free structure modeling method on both TBM and FM targets is very close to the combination of template-based and template-free modeling methods. This demonstrates that the distance-based template-free modeling method powered by deep learning can largely replace the traditional template-based modeling method even on TBM targets that TBM methods used to dominate and therefore provides a uniform structure modeling approach to any protein. Finally, on the 38 CASP14 FM and FM/TBM hard domains, MULTICOM2 server predictors (MULTICOM-HYBRID, MULTICOM-DEEP, MULTICOM-DIST) were ranked among the top 20 automated server predictors in the CASP14 experiment. After combining multiple predictors from the same research group as one entry, MULTICOM-HYBRID was ranked no. 5. The source code of MULTICOM2 is freely available at https://github.com/multicom-toolbox/multicom/tree/multicom_v2.0.

Highlights

Protein structure prediction is an important problem in bioinformatics and has been studied for decades
Since MULTCOM2 is an automated prediction system, we evaluated its performance together with other CASP14 automated server predictors on 38 free modeling (FM) and FM/template-based modeling (TBM) domains, excluding CASP14 human predictors involving human intervention in prediction
We develop and release our latest automated protein structure prediction system (MULTICOM2) as an opensource software package for the community to use

Summary

Introduction

Protein structure prediction is an important problem in bioinformatics and has been studied for decades. We present our latest open-source protein tertiary structure prediction system—MULTICOM2, an integration of template-based modeling (TBM) and template-free modeling (FM) methods. In the blind CASP14 experiment, the average TM-score of the models predicted by our server predictor based on the MULTICOM2 system is 0.720 for 58 TBM (regular) domains and 0.514 for 38 FM and FM/TBM (hard) domains, indicating that MULTICOM2 is capable of predicting good tertiary structures across the board. It can predict the correct fold for 76 CASP14 domains (95% regular domains and 55% hard domains) if only one prediction is made for a domain. When there are no good templates, template-free modeling methods are the only viable choice for constructing good structural models without referring to known protein structure templates

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

MULTICOM2 open-source protein structure prediction system powered by deep learning and distance prediction

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Scientific Reports

Lead the way for us

Similar Papers

RECURSIVE PROTEIN MODELING: A DIVIDE AND CONQUER STRATEGY FOR PROTEIN STRUCTURE PREDICTION AND ITS CASE STUDY IN CASP9
Jianlin Cheng ... Zheng Wang
Journal of Bioinformatics and Computational Biology | VOL. 10
Jianlin Cheng, et. al.Jianlin Cheng ... Zheng Wang
01 Jun 2012
Journal of Bioinformatics and Computational Biology | VOL. 10

Recursive protein modeling: A divide and conquer strategy for protein structure prediction and its case study in CASP9
Jianlin Cheng ... Zheng Wang
-
Jianlin Cheng, et. al. Jianlin Cheng ... Zheng Wang
01 Nov 2011
01 Nov 2011

Protein tertiary structure modeling driven by deep learning and contact distance prediction in CASP13.
Jie Hou ... Tianqi Wu
Proteins: Structure, Function, and Bioinformatics | VOL. 87
Jie Hou, et. al.Jie Hou ... Tianqi Wu
25 Apr 2019
Proteins: Structure, Function, and Bioinformatics | VOL. 87

Sequence alignment generation using intermediate sequence search for homology modeling
Shuichiro Makigaki ... Takashi Ishida
Computational and Structural Biotechnology Journal | VOL. 18
Shuichiro Makigaki, et. al.Shuichiro Makigaki ... Takashi Ishida
01 Jan 2020
Computational and Structural Biotechnology Journal | VOL. 18

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

MULTICOM2 open-source protein structure prediction system powered by deep learning and distance prediction

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Scientific Reports