LABRADOR—A Computational Workflow for Virus Detection in High-Throughput Sequencing Data

Izabela Fabiańska,Dietmar Mayer,Benjamin Richter,Hon Q Tran,Stefan Borutzki,Andreas Neubert

doi:10.3390/v13122541

Izabela Fabiańska, Dietmar Mayer + Show 4 more

Open Access

https://doi.org/10.3390/v13122541

Copy DOI

Journal: Viruses	Publication Date: Dec 18, 2021
Citations: 1	License type: CC BY 4.0

Affiliation: IDT Biologika (Germany)

Abstract

High-throughput sequencing (HTS) allows detection of known and unknown viruses in samples of broad origin. This makes HTS a perfect technology to determine whether or not the biological products, such as vaccines are free from the adventitious agents, which could support or replace extensive testing using various in vitro and in vivo assays. Due to bioinformatics complexities, there is a need for standardized and reliable methods to manage HTS generated data in this field. Thus, we developed LABRADOR—an analysis pipeline for adventitious virus detection. The pipeline consists of several third-party programs and is divided into two major parts: (i) direct reads classification based on the comparison of characteristic profiles between reads and sequences deposited in the database supported with alignment of to the best matching reference sequence and (ii) de novo assembly of contigs and their classification on nucleotide and amino acid levels. To meet the requirements published in guidelines for biologicals’ safety we generated a custom nucleotide database with viral sequences. We tested our pipeline on publicly available HTS datasets and showed that LABRADOR can reliably detect viruses in mixtures of model viruses, vaccines and clinical samples.

Highlights

Considering that nucleic acid sequence to design primers might be unavailable for viruses with poorly annotated genomes or for novel viruses, the potential contaminating virus can be missed with PCR-based methods
We developed LABRADOR, a computational workflow for virus detection in high-throughput sequencing (HTS) datasets together with the customized database containing nucleotide sequences of viruses recommended for the evaluation of biological products
At the beginning of our pipeline, a simple quality filtering reads is performed with a well-established tool, Trimmomatic [45], because we examine the quality of our internal sequencing runs with another tool prior to using LABRADOR

Summary

Introduction

Production of biologicals, such as viral vaccines, is prone to adventitious, unintentionally introduced contaminants [1]. Considering that nucleic acid sequence to design primers might be unavailable for viruses with poorly annotated genomes or for novel viruses, the potential contaminating virus can be missed with PCR-based methods This obstacle can be overcome with high-throughput sequencing (HTS), which can detect any DNA or RNA molecule from biological sample regardless its intrinsic sequence. The major disadvantage shown of this type of classification is that it often results in several false positives, the extraction and comparison of discriminating patterns (k-mers) represents a field for algorithm’s improvement [21] Another challenge of virus detection from HTS data is a curation of database with nucleotide and/or amino acid sequences. LABRADOR is written in Python and incorporates several open-source tools It detects viruses using a pattern-based classification of sequencing reads coupled with mapping to corresponding nucleotide sequence, allowing the detection of low concentration or short fragments of viruses. LABADOR could efficiently classify viruses from four published datasets i.e., in silico generated microbiome, spike virus experiments, vaccines and clinical samples, proving its comprehensiveness for virus detection

Software Environment

LABRADOR Wrapper

Preprocessing

Classfication of Viral Sequencing Reads and Mapping to Reference Genome

De Novo Contig Assembly and Classification

Creation of a Custom Viral Database

Evaluation of LABRADOR Workflow on Published

Analyses Perfomed in LABRADOR Workflow and Custom Database Constuction

Detection of AVT Model Viruses from Spiking Study

Detection of Viruses in Datasets from Real-Life Experiments

Discussion

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

LABRADOR—A Computational Workflow for Virus Detection in High-Throughput Sequencing Data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Viruses

Lead the way for us

Similar Papers

Development of a computational strategy to compare repetitive element enrichment between experimental conditions from high-throughput sequencing datasets
Steven Criscione ... Nicola Neretti
BMC Proceedings | VOL. 6
Steven Criscione, et. al.Steven Criscione ... Nicola Neretti
01 Oct 2012
BMC Proceedings | VOL. 6

Actinobacteria and Cyanobacteria Diversity in Terrestrial Antarctic Microenvironments Evaluated by Culture-Dependent and Independent Methods.
Adriana Rego ... António G G Sousa
Frontiers in Microbiology | VOL. 10
Adriana Rego, et. al.Adriana Rego ... António G G Sousa
31 May 2019
Frontiers in Microbiology | VOL. 10

Traces of SARS-CoV-2 RNA in Peripheral Blood Cells of Patients with COVID-19.
Ahmed Moustafa ... Ramy K Aziz
OMICS A Journal of Integrative Biology | VOL. 25
Ahmed Moustafa, et. al.Ahmed Moustafa ... Ramy K Aziz
19 Jul 2021
OMICS A Journal of Integrative Biology | VOL. 25

Comprehensive Evaluation and Optimization of Amplicon Library Preparation Methods for High-Throughput Antibody Sequencing
Ulrike Menzel ... Derya Unutmaz
PloS one | VOL. 9
Ulrike Menzel, et. al.Ulrike Menzel ... Derya Unutmaz
08 May 2014
PloS one | VOL. 9

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

LABRADOR—A Computational Workflow for Virus Detection in High-Throughput Sequencing Data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Viruses