Abstract

Although the pan-genome concept originated in prokaryote genomics, an increasing number of eukaryote species pan-genomes have also been analysed. However, there is a relative lack of software intended for eukaryote pan-genome analysis compared to that available for prokaryotes. In a previous study, we analysed the pan-genomes of four model fungi with a computational pipeline that constructed pan-genomes using the synteny-dependent Pan-genome Ortholog Clustering Tool (PanOCT) approach. Here, we present a modified and improved version of that pipeline which we have called Pangloss. Pangloss can perform gene prediction for a set of genomes from a given species that the user provides, constructs and optionally refines a species pan-genome from that set using PanOCT, and can perform various functional characterisation and visualisation analyses of species pan-genome data. To demonstrate Pangloss’s capabilities, we constructed and analysed a species pan-genome for the oleaginous yeast Yarrowia lipolytica and also reconstructed a previously-published species pan-genome for the opportunistic respiratory pathogen Aspergillus fumigatus. Pangloss is implemented in Python, Perl and R and is freely available under an open source GPLv3 licence via GitHub.

Highlights

  • Species pan-genomes have been extensively studied in prokaryotes, where pan-genome evolution is primarily driven by rampant horizontal gene transfer (HGT) [1,2,3,4]

  • Unlike prokaryote pan-genomes, eukaryote pan-genomes evolve via a variety of processes besides HGT, these include variations in ploidy and heterozygosity within plants [8], and cases of introgression, gene duplication and repeat-induced point mutation in fungi and plankton [9,10,11,12]

  • A Y. lipolytica species pan-genome was constructed with Pangloss via Pan-genome Ortholog Clustering Tool (PanOCT) using publiclyavailable assembly data from seven strains, including the reference CLIB122 strain and a number of other industrially-relevant strains [24,54,55,56] (Table S1)

Read more

Summary

Introduction

Species pan-genomes have been extensively studied in prokaryotes, where pan-genome evolution is primarily driven by rampant horizontal gene transfer (HGT) [1,2,3,4]. Unlike prokaryote pan-genomes, eukaryote pan-genomes evolve via a variety of processes besides HGT, these include variations in ploidy and heterozygosity within plants [8], and cases of introgression, gene duplication and repeat-induced point mutation in fungi and plankton [9,10,11,12]. The majority of software and pipelines available for pan-genome analysis are explicitly or implicitly intended for prokaryote datasets. The commonly-cited pipeline Roary is intended for use with genomic location data generated by the prokaryote genome annotation software Prokka [13,14]. A number of other methodologies such as seq-seq-pan or SplitMEM use genome alignment or de Bruijn graph-based approaches for pan-genome construction, which are

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call