Considerations for Optimization of High-Throughput Sequencing Bioinformatics Pipelines for Virus Detection.

Christophe Lambert,Cassandra Braxton,Daniel Rozelle,Brandye Michaels,Arifa Khan,Fabio La Neve,Paul Duncan,Robert Charlebois,Heather Malicki,Avisek Deyati,Sebastien Ribrioux,Zhihui Yang,Wenping Sun

doi:10.3390/v10100528

Abstract

High-throughput sequencing (HTS) has demonstrated capabilities for broad virus detection based upon discovery of known and novel viruses in a variety of samples, including clinical, environmental, and biological. An important goal for HTS applications in biologics is to establish parameter settings that can afford adequate sensitivity at an acceptable computational cost (computation time, computer memory, storage, expense or/and efficiency), at critical steps in the bioinformatics pipeline, including initial data quality assessment, trimming/cleaning, and assembly (to reduce data volume and increase likelihood of appropriate sequence identification). Additionally, the quality and reliability of the results depend on the availability of a complete and curated viral database for obtaining accurate results; selection of sequence alignment programs and their configuration, that retains specificity for broad virus detection with reduced false-positive signals; removal of host sequences without loss of endogenous viral sequences of interest; and use of a meaningful reporting format, which can retain critical information of the analysis for presentation of readily interpretable data and actionable results. Furthermore, after alignment, both automated and manual evaluation may be needed to verify the results and help assign a potential risk level to residual, unmapped reads. We hope that the collective considerations discussed in this paper aid toward optimization of data analysis pipelines for virus detection by HTS.

Highlights

Eukaryotic cells have been used to produce biological medicinal products since the early 1950s, based on their broad susceptibility to grow vaccine viruses
While mechanisms of target enrichment and background removal are beyond the scope of this paper [11], when we explore the effect of improving this signal-to-noise ratio (S/N) by 1000× (Figure 1, closed squares), similar levels of sensitivity could be obtained using a more modest platform (MiSeq, ~1.2 × 107 paired-end reads)
Have been discussed, including experimental design, factors related to detection sensitivity, sequencing platform and file format, data processing, database and reference genomes, de novo assembly and read mapping algorithms, post-processing of unmapped reads, and final report format

Summary

Introduction

Eukaryotic cells have been used to produce biological medicinal products since the early 1950s, based on their broad susceptibility to grow vaccine viruses. The capabilities of HTS for broad virus detection in biologics are evidenced by the unexpected detection of viruses in a vaccine and in production bioreactors, and the discovery of novel viruses in various cell lines, including insect and mammalian [5,6,7,8]. As with any nucleic acid-based virus detection method, HTS only indicates the presence of viral sequences, and needs further follow-up to experimentally confirm the results and exclude potential laboratory and reagent contamination, as well as to assess the biological relevance and significance of the result for decision-making. Discussions on sample selection and preparation, as well as virus spiking to determine sensitivity, are presented in the context of their influence on the pipeline analysis and details are discussed in another paper from AVDTIG members [11]

Factors Influencing Sensitivity of Virus Detection

Upstream Preparation of the Biological Sample

Sequencing

Bioinformatics Pipeline and Databases

Sequencing Platform and Output Files

Data Analysis Pipeline Design

Database Selection

Reference Subtraction and Counter-Screen

Processing of Viral Hits

Unmapped Sequences

Data Captured in Raw Output

Final Reporting Format

Conclusions

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Viruses	Publication Date: Sep 27, 2018
Citations: 21	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Considerations for Optimization of High-Throughput Sequencing Bioinformatics Pipelines for Virus Detection.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Viruses

Lead the way for us

Similar Papers

AI-enabled pipeline for virus detection, validation, and SNP discovery from next-generation sequencing data
Abozar Ghorbani ... Pietro Hiram Guzzi
Frontiers in Genetics | VOL. 15
Abozar Ghorbani, et. al.Abozar Ghorbani ... Pietro Hiram Guzzi
11 Nov 2024
Frontiers in Genetics | VOL. 15

Abstract 4881: Detecting cancer microbiota using unmapped RNA reads on spatial transcriptomics
Jeongbin Park ... Dongjoo Lee
Cancer Research | VOL. 84
Jeongbin Park, et. al.Jeongbin Park ... Dongjoo Lee
22 Mar 2024
Abstract 4881: Detecting cancer microbiota using unmapped RNA reads on spatial transcriptomics
Jeongbin Park ... Dongjoo Lee

Simulated High Throughput Sequencing Datasets: A Crucial Tool for Validating Bioinformatic Pathogen Detection Pipelines.
Andres S Espindola
Biology | VOL. 13
Andres S EspindolaAndres S Espindola
06 Sep 2024
Biology | VOL. 13

PhytoPipe: a phytosanitary pipeline for plant pathogen detection and diagnosis using RNA-seq data
Xiaojun Hu ... Clint D Mcfarland
BMC Bioinformatics | VOL. 24
Xiaojun Hu, et. al.Xiaojun Hu ... Clint D Mcfarland
13 Dec 2023
BMC Bioinformatics | VOL. 24

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Considerations for Optimization of High-Throughput Sequencing Bioinformatics Pipelines for Virus Detection.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Viruses