A common objective in microbial forensic investigations is to identify the origin of a recovered pathogenic bacterium by DNA sequencing. However, there is currently no consensus about how degrees of belief in such origin hypotheses should be quantified, interpreted, and communicated to wider audiences. To fill this gap, we have developed a concept based on calculating probabilistic evidential values for microbial forensic hypotheses. The likelihood-ratio method underpinning this concept is widely used in other forensic fields, such as human DNA matching, where results are readily interpretable and have been successfully communicated in juridical hearings. The concept was applied to two case scenarios of interest in microbial forensics: (1) identifying source cultures among series of very similar cultures generated by parallel serial passage of the Tier 1 pathogen Francisella tularensis, and (2) finding the production facilities of strains isolated in a real disease outbreak caused by the human pathogen Listeria monocytogenes. Evidence values for the studied hypotheses were computed based on signatures derived from whole genome sequencing data, including deep-sequenced low-frequency variants and structural variants such as duplications and deletions acquired during serial passages. In the F. tularensis case study, we were able to correctly assign fictive evidence samples to the correct culture batches of origin on the basis of structural variant data. By setting up relevant hypotheses and using data on cultivated batch sources to define the reference populations under each hypothesis, evidential values could be calculated. The results show that extremely similar strains can be separated on the basis of amplified mutational patterns identified by high-throughput sequencing. In the L. monocytogenes scenario, analyses of whole genome sequence data conclusively assigned the clinical samples to specific sources of origin, and conclusions were formulated to facilitate communication of the findings. Taken together, these findings demonstrate the potential of using bacterial whole genome sequencing data, including data on both low frequency SNP signatures and structural variants, to calculate evidence values that facilitate interpretation and communication of the results. The concept could be applied in diverse scenarios, including both epidemiological and forensic source tracking of bacterial infectious disease outbreaks.
Read full abstract