Abstract

High-throughput sequencing (HTS) has demonstrated capabilities for broad virus detection based upon discovery of known and novel viruses in a variety of samples, including clinical, environmental, and biological. An important goal for HTS applications in biologics is to establish parameter settings that can afford adequate sensitivity at an acceptable computational cost (computation time, computer memory, storage, expense or/and efficiency), at critical steps in the bioinformatics pipeline, including initial data quality assessment, trimming/cleaning, and assembly (to reduce data volume and increase likelihood of appropriate sequence identification). Additionally, the quality and reliability of the results depend on the availability of a complete and curated viral database for obtaining accurate results; selection of sequence alignment programs and their configuration, that retains specificity for broad virus detection with reduced false-positive signals; removal of host sequences without loss of endogenous viral sequences of interest; and use of a meaningful reporting format, which can retain critical information of the analysis for presentation of readily interpretable data and actionable results. Furthermore, after alignment, both automated and manual evaluation may be needed to verify the results and help assign a potential risk level to residual, unmapped reads. We hope that the collective considerations discussed in this paper aid toward optimization of data analysis pipelines for virus detection by HTS.

Highlights

  • Eukaryotic cells have been used to produce biological medicinal products since the early 1950s, based on their broad susceptibility to grow vaccine viruses

  • While mechanisms of target enrichment and background removal are beyond the scope of this paper [11], when we explore the effect of improving this signal-to-noise ratio (S/N) by 1000× (Figure 1, closed squares), similar levels of sensitivity could be obtained using a more modest platform (MiSeq, ~1.2 × 107 paired-end reads)

  • Have been discussed, including experimental design, factors related to detection sensitivity, sequencing platform and file format, data processing, database and reference genomes, de novo assembly and read mapping algorithms, post-processing of unmapped reads, and final report format

Read more

Summary

Introduction

Eukaryotic cells have been used to produce biological medicinal products since the early 1950s, based on their broad susceptibility to grow vaccine viruses. The capabilities of HTS for broad virus detection in biologics are evidenced by the unexpected detection of viruses in a vaccine and in production bioreactors, and the discovery of novel viruses in various cell lines, including insect and mammalian [5,6,7,8]. As with any nucleic acid-based virus detection method, HTS only indicates the presence of viral sequences, and needs further follow-up to experimentally confirm the results and exclude potential laboratory and reagent contamination, as well as to assess the biological relevance and significance of the result for decision-making. Discussions on sample selection and preparation, as well as virus spiking to determine sensitivity, are presented in the context of their influence on the pipeline analysis and details are discussed in another paper from AVDTIG members [11]

Factors Influencing Sensitivity of Virus Detection
Upstream Preparation of the Biological Sample
Sequencing
Bioinformatics Pipeline and Databases
Sequencing Platform and Output Files
Data Analysis Pipeline Design
Database Selection
Reference Subtraction and Counter-Screen
Processing of Viral Hits
Unmapped Sequences
Data Captured in Raw Output
Final Reporting Format
Conclusions

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.