Speeding up all-against-all protein comparisons while maintaining sensitivity by considering subsequence-level homology.

Lucas D Wittwer,Christophe Dessimoz,Ivana Piližota,Adrian M Altenhoff

doi:10.7717/peerj.607

Abstract

Orthology inference and other sequence analyses across multiple genomes typically start by performing exhaustive pairwise sequence comparisons, a process referred to as “all-against-all”. As this process scales quadratically in terms of the number of sequences analysed, this step can become a bottleneck, thus limiting the number of genomes that can be simultaneously analysed. Here, we explored ways of speeding-up the all-against-all step while maintaining its sensitivity. By exploiting the transitivity of homology and, crucially, ensuring that homology is defined in terms of consistent protein subsequences, our proof-of-concept resulted in a 4× speedup while recovering >99.6% of all homologs identified by the full all-against-all procedure on empirical sequences sets. In comparison, state-of-the-art k-mer approaches are orders of magnitude faster but only recover 3–14% of all homologous pairs. We also outline ideas to further improve the speed and recall of the new approach. An open source implementation is provided as part of the OMA standalone software at http://omabrowser.org/standalone.

Highlights

Advances in genome sequencing have led to an immense increase in the number of available genomes (Metzker, 2009; Pagani et al, 2012)
Paralogous sequences, which start diverging through gene duplication, are believed to drive function innovation and specialisation, whereas orthologous sequences, which diverged through speciation, tend to have more similar biological function (Tatusov, How to cite this article Wittwer et al (2014), Speeding up all-against-all protein comparisons while maintaining sensitivity by considering subsequence-level homology
All four clustering variants decreased runtime compared to the full all-against-all algorithm, with a speedup factor of 2–9 depending on the variant and dataset (Fig. 4)

Summary

Introduction

Advances in genome sequencing have led to an immense increase in the number of available genomes (Metzker, 2009; Pagani et al, 2012). As the experimental annotation of these sequences would be prohibitively slow and expensive, there is a strong interest in computational methods (reviewed in Rentzsch & Orengo, 2009). Homologous proteins, which can be split up into paralogs and orthologs, diverged from a common ancestral protein (Fitch, 1970). Paralogous sequences, which start diverging through gene duplication, are believed to drive function innovation and specialisation, whereas orthologous sequences, which diverged through speciation, tend to have more similar biological function (Tatusov, How to cite this article Wittwer et al (2014), Speeding up all-against-all protein comparisons while maintaining sensitivity by considering subsequence-level homology.

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: PeerJ	Publication Date: Oct 7, 2014
Citations: 26	License type: cc-by

R Discovery Prime

R Discovery Prime

Speeding up all-against-all protein comparisons while maintaining sensitivity by considering subsequence-level homology.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PeerJ

Lead the way for us

Similar Papers

Initial Characterization of a cDNA Encoding a Heat Shock Protein Homolog From Stachybotrys chartarum
F.M Blachere ... D.H Beezhold
Journal of Allergy and Clinical Immunology | VOL. 119
F.M Blachere, et. al.F.M Blachere ... D.H Beezhold
01 Jan 2007
Journal of Allergy and Clinical Immunology | VOL. 119

Identification of New Genes Regulated by the Crt1 Transcription Factor, an Effector of the DNA Damage Checkpoint Pathway in Saccharomyces cerevisiae
Jolanta Zaim ... Andrzej M Kierzek
Journal of Biological Chemistry | VOL. 280
Jolanta Zaim, et. al.Jolanta Zaim ... Andrzej M Kierzek
01 Jan 2004
Journal of Biological Chemistry | VOL. 280

Assessment of Rapid MinION Nanopore DNA Virus Meta-Genomics Using Calves Experimentally Infected with Bovine Herpes Virus-1.
Gaelle Esnault ... Paula Lagan
Viruses | VOL. 14
Gaelle Esnault, et. al.Gaelle Esnault ... Paula Lagan
24 Aug 2022
Viruses | VOL. 14

An On-Line Isocratic HPLC System for the Analysis of PTH-Amino Acids on A Gas-Phase Sequencer
J. E. Shively ... B. Krieger
-
J. E. Shively, et. al.J. E. Shively ... B. Krieger
01 Jan 1987
01 Jan 1987

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Speeding up all-against-all protein comparisons while maintaining sensitivity by considering subsequence-level homology.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PeerJ