Abstract

BackgroundProtein comparative analysis and similarity searches play essential roles in structural bioinformatics. A couple of algorithms for protein structure alignments have been developed in recent years. However, facing the rapid growth of protein structure data, improving overall comparison performance and running efficiency with massive sequences is still challenging.ResultsHere, we propose MADOKA, an ultra-fast approach for massive structural neighbor searching using a novel two-phase algorithm. Initially, we apply a fast alignment between pairwise structures. Then, we employ a score to select pairs with more similarity to carry out a more accurate fragment-based residue-level alignment. MADOKA performs about 6–100 times faster than existing methods, including TM-align and SAL, in massive alignments. Moreover, the quality of structural alignment of MADOKA is better than the existing algorithms in terms of TM-score and number of aligned residues. We also develop a web server to search structural neighbors in PDB database (About 360,000 protein chains in total), as well as additional features such as 3D structure alignment visualization. The MADOKA web server is freely available at: http://madoka.denglab.org/ConclusionsMADOKA is an efficient approach to search for protein structure similarity. In addition, we provide a parallel implementation of MADOKA which exploits massive power of multi-core CPUs.

Highlights

  • Protein structure comparative analysis and similarity searches play essential roles in structural bioinformatics

  • Proteins that differ from fold families in the Structural classification of proteins (SCOP) and Protein class (CATH) categories may contain significant structural similarity [13]

  • Datasets We use three datasets to assess the performance of MADOKA

Read more

Summary

Results

SCOP and CATH [28] are used as standards for assessing the structure alignment in various methods. Proteins that differ from fold families in the SCOP and CATH categories may contain significant structural similarity [13]. The third is MALISAM [30], which consists of 130 protein pairs that are different in terms of SCOP [31] folds but structurally analogous. By the first-phase alignment, our method largely narrows down the number of pairwise proteins for precise alignments to be done in the second phase, 11,052 pairs complete both phases in total, which account for about 55.5% of all 19,900 structure pairs. We use MADOKA to search structure neighbors against the entire PDB database for each protein in the TM-align dataset. The calculation time corresponding to proteins with different lengths is shown

Conclusions
Method
Benchmark Method
Discussion
Conclusion
Methods
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call