Analysis of several key factors influencing deep learning-based inter-residue contact prediction.

Tianqi Wu,Jie Hou,Badri Adhikari,Jianlin Cheng

doi:10.1093/bioinformatics/btz679

Tianqi Wu, Jie Hou + Show 2 more

Open Access

https://doi.org/10.1093/bioinformatics/btz679

Copy DOI

Abstract

MotivationDeep learning has become the dominant technology for protein contact prediction. However, the factors that affect the performance of deep learning in contact prediction have not been systematically investigated.ResultsWe analyzed the results of our three deep learning-based contact prediction methods (MULTICOM-CLUSTER, MULTICOM-CONSTRUCT and MULTICOM-NOVEL) in the CASP13 experiment and identified several key factors [i.e. deep learning technique, multiple sequence alignment (MSA), distance distribution prediction and domain-based contact integration] that influenced the contact prediction accuracy. We compared our convolutional neural network (CNN)-based contact prediction methods with three coevolution-based methods on 75 CASP13 targets consisting of 108 domains. We demonstrated that the CNN-based multi-distance approach was able to leverage global coevolutionary coupling patterns comprised of multiple correlated contacts for more accurate contact prediction than the local coevolution-based methods, leading to a substantial increase of precision by 19.2 percentage points. We also tested different alignment methods and domain-based contact prediction with the deep learning contact predictors. The comparison of the three methods showed deeper sequence alignments and the integration of domain-based contact prediction with the full-length contact prediction improved the performance of contact prediction. Moreover, we demonstrated that the domain-based contact prediction based on a novel ab initio approach of parsing domains from MSAs alone without using known protein structures was a simple, fast approach to improve contact prediction. Finally, we showed that predicting the distribution of inter-residue distances in multiple distance intervals could capture more structural information and improve binary contact prediction.Availability and implementation https://github.com/multicom-toolbox/DNCON2/.Supplementary information Supplementary data are available at Bioinformatics online.

Highlights

Evolutionary variation in protein sequences is constrained by protein function and structure
We analyzed the results of our three deep learning-based contact prediction methods (MULTICOMCLUSTER, MULTICOM-CONSTRUCT and MULTICOM-NOVEL) in the CASP13 experiment and identified several key factors [i.e. deep learning technique, multiple sequence alignment (MSA), distance distribution prediction and domain-based contact integration] that influenced the contact prediction accuracy
We demonstrated how the contact distance distribution prediction helped improve the performance of contact prediction and investigated how the number of effective sequences (Neff) in MSAs, MSA generation protocols and domain parsing method contributed to the contact prediction improvement

Summary

Introduction

Evolutionary variation in protein sequences is constrained by protein function and structure. Observed correlated mutation patterns in the sequences of a protein family indicate the direct physical contact between residue pairs in its 3D structure (Altschuh et al, 1988), which can be used for inter-residue contact prediction (Gobel et al, 1994). An approximate 3D protein structure can be built with good accuracy if a sufficient amount of accurately predicted residue–residue contacts are available (Marks et al, 2011; Monastyrskyy et al, 2014). Due to the advancement in the DNA/RNA sequencing technology (Meyer et al, 2008; Wilke et al, 2016), a large number of sequences are available in public databases, making it possible for characterizing correlations between residue pairs of many proteins more accurately for contact prediction than before. Some of them may reflect the functional constraints without structural implication and some of them may be accidental indirect correlated mutations due to transitive effects (Weigt et al, 2009).

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Bioinformatics	Publication Date: Aug 30, 2019
Citations: 35	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Analysis of several key factors influencing deep learning-based inter-residue contact prediction.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Bioinformatics

Lead the way for us

Similar Papers

Protein tertiary structure modeling driven by deep learning and contact distance prediction in CASP13.
Jie Hou ... Tianqi Wu
Proteins: Structure, Function, and Bioinformatics | VOL. 87
Jie Hou, et. al.Jie Hou ... Tianqi Wu
25 Apr 2019
Proteins: Structure, Function, and Bioinformatics | VOL. 87

Protein Interresidue Contact Prediction Based on Deep Learning and Massive Features from Multi-sequence Alignment
Huiling Zhang ... Hing-Fung Ting
-
Huiling Zhang, et. al.Huiling Zhang ... Hing-Fung Ting
01 Jan 2020
01 Jan 2020

Predicting accurate contacts in thousands of Pfam domain families using PconsC3.
Mirco Michel ... Magnus Ekeberg
Bioinformatics (Oxford, England) | VOL. 33
Mirco Michel, et. al.Mirco Michel ... Magnus Ekeberg
23 May 2017
Bioinformatics (Oxford, England) | VOL. 33

Protein contact prediction using metagenome sequence data and residual neural networks.
Qi Wu ... Zhenling Peng
Bioinformatics | VOL. 36
Qi Wu, et. al.Qi Wu ... Zhenling Peng
07 Jun 2019
Bioinformatics | VOL. 36

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Analysis of several key factors influencing deep learning-based inter-residue contact prediction.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Bioinformatics