Abstract

BackgroundPan-genome approaches afford the discovery of homology relations in a set of genomes, by determining how some gene families are distributed among a given set of genomes. The retrieval of a complete gene distribution among a class of genomes is an NP-hard problem because computational costs increase with the number of analyzed genomes, in fact, all-against-all gene comparisons are required to completely solve the problem. In presence of phylogenetically distant genomes, due to the variability introduced in gene duplication and transmission, the task of recognizing homologous genes becomes even more difficult. A challenge on this field is that of designing fast and adaptive similarity measures in order to find a suitable pan-genome structure of homology relations.ResultsWe present PanDelos, a stand alone tool for the discovery of pan-genome contents among phylogenetic distant genomes. The methodology is based on information theory and network analysis. It is parameter-free because thresholds are automatically deduced from the context. PanDelos avoids sequence alignment by introducing a measure based on k-mer multiplicity. The k-mer length is defined according to general arguments rather than empirical considerations. Homology candidate relations are integrated into a global network and groups of homologous genes are extracted by applying a community detection algorithm.ConclusionsPanDelos outperforms existing approaches, Roary and EDGAR, in terms of running times and quality content discovery. Tests were run on collections of real genomes, previously used in analogous studies, and in synthetic benchmarks that represent fully trusted golden truth. The software is available at https://github.com/GiugnoLab/PanDelos.

Highlights

  • Pan-genome approaches afford the discovery of homology relations in a set of genomes, by determining how some gene families are distributed among a given set of genomes

  • PanDelos has been compared to Roary and EDGAR

  • It runs under Linux systems and is takes as input genomic data in GFF format

Read more

Summary

Introduction

Pan-genome approaches afford the discovery of homology relations in a set of genomes, by determining how some gene families are distributed among a given set of genomes. Homologous genes can be distinguished into paralogous, when homology occurs within the same genome, or orthologous, when homology occurs between different genomes. We call pan-genome content discovery the determination of homologous groups within a collection of genomes. Different mechanisms are involved in gene transmission. Orthology is associated to a “vertical” transmission It happens among genomes in the same lineage and involves most of the genetic contents. “horizontal” transmission occurs between genomes of organisms of different lineages, involving one or few genes. Genes present in every genome are core genes of the pan-genome and they may be involved in essential living functionalities.

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call