Abstract

BackgroundGenome assembly of viruses with high mutation rates, such as Norovirus and other RNA viruses, or from metagenome samples, poses a challenge for the scientific community due to the coexistence of several viral quasispecies and strains. Furthermore, there is no standard method for obtaining whole-genome sequences in non-related patients. After polyA RNA isolation and sequencing in eight patients with acute gastroenteritis, we evaluated two de Bruijn graph assemblers (SPAdes and MEGAHIT), combined with four different and common pre-assembly strategies, and compared those yielding whole genome Norovirus contigs.ResultsReference-genome guided strategies with both host and target virus did not present any advantages compared to the assembly of non-filtered data in the case of SPAdes, and in the case of MEGAHIT, only host genome filtering presented improvements. MEGAHIT performed better than SPAdes in most samples, reaching complete genome sequences in most of them for all the strategies employed. Read binning with CD-HIT improved assembly when paired with different analysis strategies, and more notably in the case of SPAdes.ConclusionsNot all metagenome assemblies are equal and the choice in the workflow depends on the species studied and the prior steps to analysis. We may need different approaches even for samples treated equally due to the presence of high intra host variability. We tested and compared different workflows for the accurate assembly of Norovirus genomes and established their assembly capacities for this purpose.

Highlights

  • Genome assembly of viruses with high mutation rates, such as Norovirus and other RNA viruses, or from metagenome samples, poses a challenge for the scientific community due to the coexistence of several viral quasispecies and strains

  • Raw data obtained from eight human Norovirus samples passed FASTQC (v0.11.5, Babraham Bioinformatics) quality filters regarding the parameters per base sequence quality, per sequence average quality, N content and adapter sequences after the trimming steps described in the methods section

  • After the characterization with Norovirus genotyping tool of the longest NoV contig obtained by means of BLAST against the selected GenBank references (Additional File 1; filtered BLAST results for contigs >7 kb), completeness was assessed using checkV [35]

Read more

Summary

Introduction

Genome assembly of viruses with high mutation rates, such as Norovirus and other RNA viruses, or from metagenome samples, poses a challenge for the scientific community due to the coexistence of several viral quasispecies and strains. Among viruses with high mutation rates and presence of large numbers of genetic variations or quasispecies are Noroviruses, a group of the Caliciviridae family They are positive and single-stranded RNA viruses without a lipid envelope, with genome lengths varying from 7.5 to 7.7 Kb [13]. These viruses are known to be highly pathogenic and infectious, and are responsible for most cases of acute gastroenteritis (50 % of all outbreaks worldwide), symptoms are generally non-lethal [14]. Of the ten currently identified genogroups, in humans the represented genogroups causing infections are GI, GII, GIV, and recently, GVIII and GIX [17] These genogroups are established based on a minimum 43 % difference between VP1-coding sequences [17, 18]. As with other RNA viruses, the intra-genus variability of Noroviruses provides them with fast evolving capacity and makes their characterization and sequencing challenging [21,22,23]

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call