Abstract

Viral metagenomics, also known as virome studies, have yielded an unprecedented number of novel sequences, essential in recognizing and characterizing the etiological agent and the origin of emerging infectious diseases. Several tools and pipelines have been developed, to date, for the identification and assembly of viral genomes. Assembly pipelines often result in viral genomes contaminated with host genetic material, some of which are currently deposited into public databases. In the current report, we present a group of deposited sequences that encompass ribosomal RNA (rRNA) contamination. We highlight the detrimental role of chimeric next generation sequencing reads, between host rRNA sequences and viral sequences, in virus genome assembly and we present the hindrances these reads may pose to current methodologies. We have further developed a refining pipeline, the Zero Waste Algorithm (ZWA) that assists in the assembly of low abundance viral genomes. ZWA performs context-depended trimming of chimeric reads, precisely removing their rRNA moiety. These, otherwise discarded, reads were fed to the assembly pipeline and assisted in the construction of larger and cleaner contigs making a substantial impact on current assembly methodologies. ZWA pipeline may significantly enhance virus genome assembly from low abundance samples and virus metagenomics approaches in which a small number of reads determine genome quality and integrity.

Highlights

  • Identification of viruses through generation sequencing (NGS) relies on the use of well curated viral databases [1]

  • Using Basic Local Alignment Search Tool (BLAST) [22] these entries were found to encompass host RNA sequences, the vast majority of which were annotated as host ribosomal RNAs

  • Screening GenBank for chimeric assemblies that encompassed ribosomal RNA (rRNA) stretches derived from various host organisms we identified 38 erroneous entries

Read more

Summary

Introduction

Identification of viruses through generation sequencing (NGS) relies on the use of well curated viral databases [1]. Recent advances and the broad application of viral genome assemblies, from unbiased virus screening, have yielded an unprecedented amount of novel complete or partial genomes [2]. NGS has been essential in the discovery of novel viruses and the analysis of complex samples in which multiple viruses are present. RNA-seq analysis and virus genome assembly from biological and environmental materials, containing novel human and animal viruses such as SARS-cov, has greatly assisted their prompt and unbiased characterization [3]. NGS analysis of complex clinical or environmental samples has introduced the “virome" [5,6,7], a new concept of the metagenomics era

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call