MADOKA: an ultra-fast approach for large-scale protein structure similarity searching

Lei Deng,Guolun Zhong,Judong Luo,Chenzhe Liu,Hui Liu

doi:10.1186/s12859-019-3235-1

Abstract

BackgroundProtein comparative analysis and similarity searches play essential roles in structural bioinformatics. A couple of algorithms for protein structure alignments have been developed in recent years. However, facing the rapid growth of protein structure data, improving overall comparison performance and running efficiency with massive sequences is still challenging.ResultsHere, we propose MADOKA, an ultra-fast approach for massive structural neighbor searching using a novel two-phase algorithm. Initially, we apply a fast alignment between pairwise structures. Then, we employ a score to select pairs with more similarity to carry out a more accurate fragment-based residue-level alignment. MADOKA performs about 6–100 times faster than existing methods, including TM-align and SAL, in massive alignments. Moreover, the quality of structural alignment of MADOKA is better than the existing algorithms in terms of TM-score and number of aligned residues. We also develop a web server to search structural neighbors in PDB database (About 360,000 protein chains in total), as well as additional features such as 3D structure alignment visualization. The MADOKA web server is freely available at: http://madoka.denglab.org/ConclusionsMADOKA is an efficient approach to search for protein structure similarity. In addition, we provide a parallel implementation of MADOKA which exploits massive power of multi-core CPUs.

Highlights

Protein structure comparative analysis and similarity searches play essential roles in structural bioinformatics
Proteins that differ from fold families in the Structural classification of proteins (SCOP) and Protein class (CATH) categories may contain significant structural similarity [13]
Datasets We use three datasets to assess the performance of MADOKA

Summary

Results

SCOP and CATH [28] are used as standards for assessing the structure alignment in various methods. Proteins that differ from fold families in the SCOP and CATH categories may contain significant structural similarity [13]. The third is MALISAM [30], which consists of 130 protein pairs that are different in terms of SCOP [31] folds but structurally analogous. By the first-phase alignment, our method largely narrows down the number of pairwise proteins for precise alignments to be done in the second phase, 11,052 pairs complete both phases in total, which account for about 55.5% of all 19,900 structure pairs. We use MADOKA to search structure neighbors against the entire PDB database for each protein in the TM-align dataset. The calculation time corresponding to proteins with different lengths is shown

Conclusions

Method

Benchmark Method

Discussion

Conclusion

Methods

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Bioinformatics	Publication Date: Dec 1, 2019
Citations: 22	License type: open-access

R Discovery Prime

R Discovery Prime

MADOKA: an ultra-fast approach for large-scale protein structure similarity searching

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

PC_ali: A tool for improved multiple alignments and evolutionary inference based on a hybrid protein sequence and structure similarity score.
Ugo Bastolla ... David Abia
Bioinformatics | VOL. 39
Ugo Bastolla, et. al.Ugo Bastolla ... David Abia
17 Oct 2023
Bioinformatics | VOL. 39

Flexible Structural Neighborhood--a database of protein structural similarities and alignments
Z. Li
Nucleic Acids Research | VOL. 34
Z. LiZ. Li
28 Dec 2005
Nucleic Acids Research | VOL. 34

Two-phase alignment algorithm for protein structure similarity searching
...
-
, et. al. ...
01 Jun 2012
01 Jun 2012

Comprehensive Evaluation of Protein Structure Alignment Methods: Scoring by Geometric Measures
Rachel Kolodny ... Michael Levitt
Journal of Molecular Biology | VOL. 346
Rachel Kolodny, et. al.Rachel Kolodny ... Michael Levitt
31 Dec 2004
Journal of Molecular Biology | VOL. 346

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

MADOKA: an ultra-fast approach for large-scale protein structure similarity searching

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics