Abstract

Comparative genome analyses of eukaryotic pathogens including fungi and oomycetes have revealed extensive variability in genome composition and structure. The genomes of individuals from the same population can exhibit different numbers of chromosomes and different organization of chromosomal segments, defining so-called accessory compartments that have been shown to be crucial to pathogenicity in plant-infecting fungi. This high level of structural variation confers a methodological challenge for population genomic analyses. Variant discovery from population sequencing data is typically achieved using established pipelines based on the mapping of short reads to a reference genome. These pipelines have been developed, and extensively used, for eukaryote genomes of both plants and animals, to retrieve single nucleotide polymorphisms and short insertions and deletions. However, they do not permit the inference of large-scale genomic structural variation, as this task typically requires the alignment of complete genome sequences. Here, we compare traditional variant discovery approaches to a pipeline based on de novo genome assembly of short read data followed by whole genome alignment, using simulated data sets with properties mimicking that of fungal pathogen genomes. We show that the latter approach exhibits levels of performance comparable to that of read-mapping based methodologies, when used on sequence data with sufficient coverage. We argue that this approach further allows additional types of genomic diversity to be explored, in particular as long-read third-generation sequencing technologies are becoming increasingly available to generate population genomic data.

Highlights

  • Comparative genome studies of fungal and oomycete pathogens have revealed highly variable genome architecture and content [reviewed by Raffaele and Kamoun, 2012; Möller and Stukenbrock, 2017]

  • We assess the accuracy of a pipeline based on de novo genome assembly followed by whole genome alignment to simultaneously recover single nucleotide polymorphisms (SNPs) and large structural variants

  • Since fungal pathogen genomes differ from human genomes in many aspects, we here aimed to compare variant calling approaches on data sets mimicking the characteristics of fungal pathogen genomes, including accessory genome compartments and high nucleotide diversity

Read more

Summary

INTRODUCTION

Comparative genome studies of fungal and oomycete pathogens have revealed highly variable genome architecture and content [reviewed by Raffaele and Kamoun, 2012; Möller and Stukenbrock, 2017]. Population genomic datasets based on generation sequencing (NGS) can be used to recover genomic variants such as single nucleotide polymorphisms (SNPs), insertions and deletions (indels), and structural variants (SVs). The latter category includes translocations, inversions, duplications, either tandem or interspersed, deletions, and novel sequence insertions (Alkan et al, 2011). Methods for calling variants in populations from WGAs are currently limited and the available approaches have not been benchmarked with fungal genome data. We assess the accuracy of a pipeline based on de novo genome assembly followed by whole genome alignment (referred to as dnWGA, Figure 1) to simultaneously recover single nucleotide polymorphisms (SNPs) and large structural variants. Since fungal pathogen genomes differ from human genomes in many aspects, we here aimed to compare variant calling approaches on data sets mimicking the characteristics of fungal pathogen genomes, including accessory genome compartments and high nucleotide diversity

METHOD OVERVIEW
RESULTS AND DISCUSSION
DATA AVAILABILITY STATEMENT
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.