Abstract
Most metazoans are associated with symbionts. Characterizing the effect of a particular symbiont often requires getting access to its genome, which is usually done by sequencing the whole community. We present MinYS, a targeted assembly approach to assemble a particular genome of interest from such metagenomic data. First, taking advantage of a reference genome, a subset of the reads is assembled into a set of backbone contigs. Then, this draft assembly is completed using the whole metagenomic readset in a de novo manner. The resulting assembly is output as a genome graph, enabling different strains with potential structural variants coexisting in the sample to be distinguished. MinYS was applied to 50 pea aphid resequencing samples, with variable diversity in symbiont communities, in order to recover the genome sequence of its obligatory bacterial symbiont, Buchnera aphidicola. It was able to return high-quality assemblies (one contig assembly in 90% of the samples), even when using increasingly distant reference genomes, and to retrieve large structural variations in the samples. Because of its targeted essence, it outperformed standard metagenomic assemblers in terms of both time and assembly quality.
Highlights
Advances of molecular techniques have greatly contributed to the recognition of the importance of microorganisms in every ecosystem
These datasets are unbalanced: the great majority of the reads often originate from the host genome, but since the genomes of the symbionts are often several orders of magnitude smaller than that of the eukaryotic host, symbiont genomes can have large read depth in such samples. This enables the extraction of relevant information about the symbionts, but requires significant effort, since the host reads are a computational burden for most analyses. In this context, providing bioinformatic tools that enable the assembly of a particular genome of interest from a metagenomic sample, ignoring the overwhelming amount of reads from other organisms, would greatly accelerate the characterization of symbiont genomes, and decipher particular host–symbiont relationships
The number of reads is on average 84 [198] million for individual sequencing datasets, with an average coverage of 628× (3694×) for the B. aphidicola genome. In these datasets, >90% of the reads originate from the insect host and are not useful when focusing on symbiont genomes
Summary
Advances of molecular techniques have greatly contributed to the recognition of the importance of microorganisms in every ecosystem. As symbionts are generally not cultivable outside the host, the whole community is usually sequenced, resulting in a metagenomic dataset mixing host and symbiont reads These datasets are unbalanced: the great majority of the reads often originate from the host genome, but since the genomes of the symbionts are often several orders of magnitude smaller than that of the eukaryotic host, symbiont genomes can have large read depth in such samples. This enables the extraction of relevant information about the symbionts, but requires significant effort, since the host reads are a computational burden for most analyses. In this context, providing bioinformatic tools that enable the assembly of a particular genome of interest from a metagenomic sample, ignoring the overwhelming amount of reads from other organisms, would greatly accelerate the characterization of symbiont genomes, and decipher particular host–symbiont relationships
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.