Abstract

BackgroundSeveral bioinformatics pipelines have been developed to detect sequences from viruses that integrate into the human genome because of the health relevance of these integrations, such as in the persistence of viral infection and/or in generating genotoxic effects, often progressing into cancer. Recent genomics and metagenomics analyses have shown that viruses also integrate into the genome of non-model organisms (i.e., arthropods, fish, plants, vertebrates). However, rarely studies of endogenous viral elements (EVEs) in non-model organisms have gone beyond their characterization from reference genome assemblies. In non-model organisms, we lack a thorough understanding of the widespread occurrence of EVEs and their biological relevance, apart from sporadic cases which nevertheless point to significant roles of EVEs in immunity and regulation of expression. The concomitance of repetitive DNA, duplications and/or assembly fragmentations in a genome sequence and intrasample variability in whole-genome sequencing (WGS) data could determine misalignments when mapping data to a genome assembly. This phenomenon hinders our ability to properly identify integration sites.ResultsTo fill this gap, we developed ViR, a pipeline which solves the dispersion of reads due to intrasample variability in sequencing data from both single and pooled DNA samples thus ameliorating the detection of integration sites. We tested ViR to work with both in silico and real sequencing data from a non-model organism, the arboviral vector Aedes albopictus. Potential viral integrations predicted by ViR were molecularly validated supporting the accuracy of ViR results.ConclusionViR will open new venues to explore the biology of EVEs, especially in non-model organisms. Importantly, while we generated ViR with the identification of EVEs in mind, its application can be extended to detect any lateral transfer event providing an ad-hoc sequence to interrogate.

Highlights

  • Several bioinformatics pipelines have been developed to detect sequences from viruses that integrate into the human genome because of the health relevance of these integrations, such as in the persistence of viral infection and/or in generating genotoxic effects, often progressing into cancer

  • ViR works downstream of any currently available Endogenous viral element (EVE) prediction tool using paired-end reads to improve the characterization of integration sites by solving the dispersion of reads in genome sequences that are rich of repetitive DNA

  • Host read supporting a viral integration, will distribute across these “equivalent” mapping genomic positions, and the signal for the integration site, expressed in terms of host reads coverage, may not reach the threshold of detection. This situation is exacerbated in nonmodel organism, with genome assemblies in which a sequence may have been assembled into different contigs or scaffolds and that are rich of repetitive DNA

Read more

Summary

Introduction

Several bioinformatics pipelines have been developed to detect sequences from viruses that integrate into the human genome because of the health relevance of these integrations, such as in the persistence of viral infection and/or in generating genotoxic effects, often progressing into cancer. The genomes of organisms as different as arthropods, fish, snakes, birds, vertebrates and plants were shown to host EVEs, which derive from DNA viruses and retroviruses, and from nonretroviral RNA viruses [14,15,16,17,18,19,20,21,22] In these non-model organisms, EVEs range widely in numbers and tend to occur in repetitive DNA, mostly in association with transposable element (TE) sequences [20, 23, 24]. The performance of ViR was tested using in silico WGS data

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call