Applied metagenomics is a powerful emerging capability enabling the untargeted detection of pathogens, and its application in clinical diagnostics promises to alleviate the limitations of current targeted assays. While metagenomics offers a hypothesis-free approach to identify any pathogen, including unculturable and potentially novel pathogens, its application in clinical diagnostics has so far been limited by workflow-specific requirements, computational constraints, and lengthy expert review requirements. To address these challenges, we developed UltraSEQ, a first-of-its-kind accurate and scalable metagenomic bioinformatic tool for potential clinical diagnostics and biosurveillance utility. Here, we present the results of the evaluation of our novel UltraSEQ pipeline using an in silico-synthesized metagenome, mock microbial community data sets, and publicly available clinical data sets from samples of different infection types, including both short-read and long-read sequencing data. Our results show that UltraSEQ successfully detected all expected species across the tree of life in the in silico sample and detected all 10 bacterial and fungal species in the mock microbial community data set. For clinical data sets, even without requiring data set-specific configuration setting changes, background sample subtraction, or prior sample information, UltraSEQ achieved an overall accuracy of 91%. Furthermore, as an initial demonstration with a limited patient sample set, we show UltraSEQ's ability to provide antibiotic resistance and virulence factor genotypes that are consistent with phenotypic results. Taken together, the above-described results demonstrate that the UltraSEQ platform offers a transformative approach for microbial and metagenomic sample characterization, employing a biologically informed detection logic, deep metadata, and a flexible system architecture for the classification and characterization of taxonomic origin, gene function, and user-defined functions, including disease-causing infections. IMPORTANCE Traditional clinical microbiology-based diagnostic tests rely on targeted methods that can detect only one to a few preselected organisms or slow, culture-based methods. Although widely used today, these methods have several limitations, resulting in rates of cases of an unknown etiology of infection of >50% for several disease types. Massive developments in sequencing technologies have made it possible to apply metagenomic methods to clinical diagnostics, but current offerings are limited to a specific disease type or sequencer workflow and/or require laboratory-specific controls. The limitations associated with current clinical metagenomic offerings result from the fact that the backend bioinformatic pipelines are optimized for the specific parameters described above, resulting in an excess of unmaintained, redundant, and niche tools that lack standardization and explainable outputs. In this paper, we demonstrate that UltraSEQ uses a novel, information-based approach that enables accurate, evidence-based predictions for diagnosis as well as the functional characterization of a sample.
Read full abstract