Efficient inference of homologs in large eukaryotic pan-proteomes

Siavash Sheikhizadeh Anari,Sandra Smit,Dick De Ridder,M Eric Schranz

doi:10.1186/s12859-018-2362-4

Siavash Sheikhizadeh Anari, Sandra Smit + Show 2 more

Open Access

https://doi.org/10.1186/s12859-018-2362-4

Copy DOI

Journal: BMC Bioinformatics	Publication Date: Sep 26, 2018
Citations: 12	License type: open-access

Affiliation: Wageningen University & Research

Abstract

BackgroundIdentification of homologous genes is fundamental to comparative genomics, functional genomics and phylogenomics. Extensive public homology databases are of great value for investigating homology but need to be continually updated to incorporate new sequences. As new sequences are rapidly being generated, there is a need for efficient standalone tools to detect homologs in novel data.ResultsTo address this, we present a fast method for detecting homology groups across a large number of individuals and/or species. We adopted a k-mer based approach which considerably reduces the number of pairwise protein alignments without sacrificing sensitivity. We demonstrate accuracy, scalability, efficiency and applicability of the presented method for detecting homology in large proteomes of bacteria, fungi, plants and Metazoa.ConclusionsWe clearly observed the trade-off between recall and precision in our homology inference. Favoring recall or precision strongly depends on the application. The clustering behavior of our program can be optimized for particular applications by altering a few key parameters. The program is available for public use at https://github.com/sheikhizadeh/pantools as an extension to our pan-genomic analysis tool, PanTools.

Highlights

Identification of homologous genes is fundamental to comparative genomics, functional genomics and phylogenomics
We represent a pan-genome by a hierarchy of genome, annotation and proteome layers stored in a Neo4j graph database to connect different types of data (Fig. 1)
We tested scalability on 5 datasets of increasing size compiled from 93 Saccharomyces cerevisiae strains [20] and 5 datasets compiled from 19 Arabidopsis thaliana accessions [21]

Summary

Results

We present a fast method for detecting homology groups across a large number of individuals and/or species. We adopted a k-mer based approach which considerably reduces the number of pairwise protein alignments without sacrificing sensitivity. Scalability, efficiency and applicability of the presented method for detecting homology in large proteomes of bacteria, fungi, plants and Metazoa

Conclusions

Background

Methods

Results and discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Efficient inference of homologs in large eukaryotic pan-proteomes

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

Functional genomics and the comparative physiology of hypoxia.
Frank L Powell
Annual Review of Physiology | VOL. 65
Frank L PowellFrank L Powell
01 May 2002
Annual Review of Physiology | VOL. 65

PlantFUNCO: Integrative Functional Genomics Database Reveals Clues into Duplicates Divergence Evolution.
Víctor Roces ... Andrey Rzhetsky
Molecular biology and evolution | VOL. 41
Víctor Roces, et. al.Víctor Roces ... Andrey Rzhetsky
27 Feb 2024
Molecular biology and evolution | VOL. 41

Stoichiometry of site-specific protein phosphorylation estimated with phosphopeptide-specific antibodies.
Cristinel P Mîinea ... Gustav E Lienhard
BioTechniques | VOL. 34
Cristinel P Mîinea, et. al.Cristinel P Mîinea ... Gustav E Lienhard
01 Apr 2003
BioTechniques | VOL. 34

Reliable and reproducible method to extract high-quality RNA from plant tissues rich in secondary metabolites.
Anders Lönneborg ... Marianne Jensen
BioTechniques | VOL. 29
Anders Lönneborg, et. al.Anders Lönneborg ... Marianne Jensen
01 Oct 2000
BioTechniques | VOL. 29

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Efficient inference of homologs in large eukaryotic pan-proteomes

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics