Eukaryotic genomes harbour sequences derived from non-retroviral RNA viruses, known as endogenous viral elements (EVEs) or non-retroviral integrated RNA virus sequences (NIRVS). These sequences represent a record of past infections and have been implicated in host anti-viral response. We have created a program to identify viral sequences integrated in a host genome. It begins with a specimen BAM file and outputs candidate NIRVS, along with putative host insertion sites and overlapping genomic features of the host genome in XML and visual formats, with minimal intermediary intervention. We ran through this software short-read data derived from the genomes of 222 wild-caught A. aegypti mosquitoes, from a dozen geographical regions, and located putative NIRVS from seven virus families. This program is as accurate as currently available software for NIRVS detection, and represents a significant improvement in adaptability and user-friendliness. Furthermore, the flexibility of this pipeline allows the user to search for sequence integrations across the genome of any organism, as long as a query sequence database and a reference genome is provided. Potential extended applications include identification of integrated transgenic sequences used for research or vector control strategies.
Read full abstract