VirSorter2: a multi-classifier, expert-guided approach to detect diverse DNA and RNA viruses

Jiarong Guo,Simon Roux,Tom O Delmont,Dean Vik,M Consuelo Gazitúa,Arvind Varsani,Ben Bolduc,Akbar Adjie Pratama,Ahmed A Zayed,Matthew B Sullivan,Guillermo Dominguez-Huerta

doi:10.1186/s40168-020-00990-y

Abstract

BackgroundViruses are a significant player in many biosphere and human ecosystems, but most signals remain “hidden” in metagenomic/metatranscriptomic sequence datasets due to the lack of universal gene markers, database representatives, and insufficiently advanced identification tools.ResultsHere, we introduce VirSorter2, a DNA and RNA virus identification tool that leverages genome-informed database advances across a collection of customized automatic classifiers to improve the accuracy and range of virus sequence detection. When benchmarked against genomes from both isolated and uncultivated viruses, VirSorter2 uniquely performed consistently with high accuracy (F1-score > 0.8) across viral diversity, while all other tools under-detected viruses outside of the group most represented in reference databases (i.e., those in the order Caudovirales). Among the tools evaluated, VirSorter2 was also uniquely able to minimize errors associated with atypical cellular sequences including eukaryotic genomes and plasmids. Finally, as the virosphere exploration unravels novel viral sequences, VirSorter2’s modular design makes it inherently able to expand to new types of viruses via the design of new classifiers to maintain maximal sensitivity and specificity.ConclusionWith multi-classifier and modular design, VirSorter2 demonstrates higher overall accuracy across major viral groups and will advance our knowledge of virus evolution, diversity, and virus-microbe interaction in various ecosystems. Source code of VirSorter2 is freely available (https://bitbucket.org/MAVERICLab/virsorter2), and VirSorter2 is also available both on bioconda and as an iVirus app on CyVerse (https://de.cyverse.org/de).1yUdaVB8pb7ryNSffJakk6Video abstract

Highlights

Microbes are widely recognized as driving nutrient and energy cycles that fuel marine and terrestrial ecosystems [1, 2], directly influencing human health and disease, and controlling the output of engineered ecosystems [3]
One set of tools rely on a combination of gene content and genomic structural features to distinguish viral from microbial sequences, including Prophinder [27], PhiSpy [28], VirSorter [29], the Earth’s Virome pipeline [17], PHAS TER [30], MARVEL [31], and VIBRANT [32]
These genomic features are either statistically compared to a null model (Prophinder, VirSorter, PHASTER), or more recently have been used as input for automatic machinelearning classifiers (MARVEL and VIBRANT)

Summary

Introduction

Microbes are widely recognized as driving nutrient and energy cycles that fuel marine and terrestrial ecosystems [1, 2], directly influencing human health and disease, and controlling the output of engineered ecosystems [3]. The other approach uses the frequencies of DNA “words” (i.e., kmers) found in known viral and cellular genomes as signatures to train machine-learning classifiers to recognize new viral and microbial sequences (e.g., VirFinder and DeepVirFinder [33, 34]) Both approaches efficiently detect common viruses that are well represented in databases, such as dsDNA bacteriophages from the Caudovirales order [31, 32], but they struggle with less welldocumented viruses like ssDNA viruses [35], RNA viruses [36, 37], and viruses that infect archaea [38, 39]. Viruses are a significant player in many biosphere and human ecosystems, but most signals remain “hidden” in metagenomic/metatranscriptomic sequence datasets due to the lack of universal gene markers, database representatives, and insufficiently advanced identification tools

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Microbiome	Publication Date: Feb 1, 2021
Citations: 603	License type: open-access

R Discovery Prime

R Discovery Prime

VirSorter2: a multi-classifier, expert-guided approach to detect diverse DNA and RNA viruses

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Microbiome

Lead the way for us

Similar Papers

Author response: Unprecedented genomic diversity of RNA viruses in arthropods reveals the ancestry of negative-sense RNA viruses
Ci-Xiu Li ... Xin-Cheng Qin
-
Ci-Xiu Li, et. al.Ci-Xiu Li ... Xin-Cheng Qin
26 Jan 2015
26 Jan 2015

Analysis of the RNA virome of basal hexapods.
Sabina Ott Rutar ... Dusan Kordis
PeerJ | VOL. 8
Sabina Ott Rutar, et. al.Sabina Ott Rutar ... Dusan Kordis
09 Jan 2020
PeerJ | VOL. 8

Traces of SARS-CoV-2 RNA in Peripheral Blood Cells of Patients with COVID-19.
Ahmed Moustafa ... Ramy K Aziz
OMICS: A Journal of Integrative Biology | VOL. 25
Ahmed Moustafa, et. al.Ahmed Moustafa ... Ramy K Aziz
19 Jul 2021
OMICS: A Journal of Integrative Biology | VOL. 25

RdRp-scan: A bioinformatic resource to identify and annotate divergent RNA viruses in metagenomic sequence data.
Justine Charon ... Sabrina Sadiq
Virus Evolution | VOL. 8
Justine Charon, et. al.Justine Charon ... Sabrina Sadiq
01 Sep 2022
Virus Evolution | VOL. 8

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

VirSorter2: a multi-classifier, expert-guided approach to detect diverse DNA and RNA viruses

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Microbiome