Abstract

Rhabdoviruses infect a large number of plant species and cause significant crop diseases. They have a negative-sense, single-stranded unsegmented or bisegmented RNA genome. The number of plant-associated rhabdovirid sequences has grown in the last few years in concert with the extensive use of high-throughput sequencing platforms. Here, we report the discovery of 27 novel rhabdovirus genomes associated with 25 different host plant species and one insect, which were hidden in public databases. These viral sequences were identified through homology searches in more than 3000 plant and insect transcriptomes from the National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA) using known plant rhabdovirus sequences as the query. The identification, assembly and curation of raw SRA reads resulted in sixteen viral genome sequences with full-length coding regions and ten partial genomes. Highlights of the obtained sequences include viruses with unique and novel genome organizations among known plant rhabdoviruses. Phylogenetic analysis showed that thirteen of the novel viruses were related to cytorhabdoviruses, one to alphanucleorhabdoviruses, five to betanucleorhabdoviruses, one to dichorhaviruses and seven to varicosaviruses. These findings resulted in the most complete phylogeny of plant rhabdoviruses to date and shed new light on the phylogenetic relationships and evolutionary landscape of this group of plant viruses. Furthermore, this study provided additional evidence for the complexity and diversity of plant rhabdovirus genomes and demonstrated that analyzing SRA public data provides an invaluable tool to accelerate virus discovery, gain evolutionary insights and refine virus taxonomy.

Highlights

  • The costs for high-throughput sequencing (HTS) have been significantly reduced each year due to advances in sequencing technologies; the number of genome and transcriptome sequencing projects has been steadily increasing, resulting in a massive number of nucleotides deposited in the Sequence Read Archive (SRA) of the National Center for Biotechnology Information (NCBI)

  • Genome organization and sequence identity, the novel viruses were tentative assigned to the established plant rhabdovirus genera Alphanucleorhabdovirus, Betanucleorhabdovirus, Cytorhabdovirus, Dichoravirus and Varicosavirus

  • We queried the publicly available plant transcriptome datasets in the transcriptome shotgun assembly (TSA) database hosted at NCBI, which resulted in the identification of 27 novel plant rhabdoviruses

Read more

Summary

Introduction

The costs for high-throughput sequencing (HTS) have been significantly reduced each year due to advances in sequencing technologies; the number of genome and transcriptome sequencing projects has been steadily increasing, resulting in a massive number of nucleotides deposited in the Sequence Read Archive (SRA) of the National Center for Biotechnology Information (NCBI). Over 16,000 petabases (1015 bases) have been deposited in the SRA, with over 6000 petabases available as open-access data [1] This large amount of data has provided significant challenges for data storage, bioinformatic analysis and management. Many of them not known to induce any apparent symptoms in their host or without a known host, have been identified from diverse environments using metagenomic approaches. This has highlighted our limited knowledge about the richness of a continuously expanding plant virosphere, which appears highly diverse in every potential host assessed so far [2,3,4,5]. The great number of viruses recently discovered by HTS, a miniscule portion of the virosphere, allowed a first glimpse of the path to a comprehensive megataxonomy of the virus world [6]

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call