Abstract

ABSTRACTAdventitious agent detection during the production of vaccines and biotechnology-based medicines is of critical importance to ensure the final product is free from any possible viral contamination. Increasing the speed and accuracy of viral detection is beneficial as a means to accelerate development timelines and to ensure patient safety. Here, several rapid viral metagenomics approaches were tested on simulated next-generation sequencing (NGS) data sets and existing data sets from virus spike-in studies done in CHO-K1 and HeLa cell lines. It was observed that these rapid methods had comparable sensitivity to full-read alignment methods used for NGS viral detection for these data sets, but their specificity could be improved. A method that first filters host reads using KrakenUniq and then selects the virus classification tool based on the number of remaining reads is suggested as the preferred approach among those tested to detect nonlatent and nonendogenous viruses. Such an approach shows reasonable sensitivity and specificity for the data sets examined and requires less time and memory as full-read alignment methods.IMPORTANCE Next-generation sequencing (NGS) has been proposed as a complementary method to detect adventitious viruses in the production of biotherapeutics and vaccines to current in vivo and in vitro methods. Before NGS can be established in industry as a main viral detection technology, further investigation into the various aspects of bioinformatics analyses required to identify and classify viral NGS reads is needed. In this study, the ability of rapid metagenomics tools to detect viruses in biopharmaceutical relevant samples is tested and compared to recommend an efficient approach. The results showed that KrakenUniq can quickly and accurately filter host sequences and classify viral reads and had comparable sensitivity and specificity to slower full read alignment approaches, such as BLASTn, for the data sets examined.

Highlights

  • While virus identification and clearance are required during the manufacturing of vaccines, biologics, and biotechnology-based medicines, several virus contamination events have occurred in both vaccine and biotherapeutic protein production

  • This percentage of reads was 80% rather than the 91 to 99% from sequences with the other similarity levels. This was the case for several simulated read sets and each time about 20% of the reads were classified into the correct genus but could not be further classified at the species level. These results suggest that both KrakenUniq and BLASTn are sensitive enough to classify the large majority of reads as the correct virus species, even when reads are from virus sequences with only 95% similarity to the corresponding viral sequence in Reference Viral Database (RVDB)

  • Results on the real next-generation sequencing (NGS) data sets suggest that the best approach to classify NGS reads quickly as viral/nonviral is to first filter host reads using KrakenUniq, followed by classification of the remaining reads using either KrakenUniq or BLASTn depending on the scale of remaining data

Read more

Summary

Introduction

While virus identification and clearance are required during the manufacturing of vaccines, biologics, and biotechnology-based medicines, several virus contamination events have occurred in both vaccine and biotherapeutic protein production. Host filtering is often recommended because it decreases the background noise caused by host genome reads when identifying virus sequences and speeds up the following classification of nonhost reads [27, 28] Another key decision is whether to use full sequence alignment methods to find homology or to use k-mer-based approaches. Kraken-derived tools (Kraken, Kraken, and KrakenUniq) and Centrifuge (using custom databases) have shown high precision and recall metrics in previous benchmarking studies [40, 41] These studies have compared the performance of k-mer classifiers on metagenomics samples containing tens to hundreds of bacterial species, the authors are unaware of any comparison among these tools for adventitious virus detection. Virus genomes can be significantly shorter than bacterial genomes and often have slightly higher gene densities [42, 43], impacting the k-mer signatures created and used during read classification

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.