Abstract

The partitioning of pathogenic strains isolated in environmental or human cases to their sources is challenging. The pathogens usually colonize multiple animal hosts, including livestock, which contaminate the food-production chain and the environment (e.g. soil and water), posing an additional public-health burden and major challenges in the identification of the source. Genomic data opens up new opportunities for the development of statistical models aiming to indicate the likely source of pathogen contamination. Here, we propose a computationally fast and efficient multinomial logistic regression source-attribution classifier to predict the animal source of bacterial isolates based on ‘source-enriched’ loci extracted from the accessory-genome profiles of a pangenomic dataset. Depending on the accuracy of the model’s self-attribution step, the modeller selects the number of candidate accessory genes that best fit the model for calculating the likelihood of (source) category membership. The Accessory genes-Based Source Attribution (AB_SA) method was applied to a dataset of strains of Salmonella enterica Typhimurium and its monophasic variant ( S . enterica 1,4,[5],12:i:-). The model was trained on 69 strains with known animal-source categories (i.e. poultry, ruminant and pig). The AB_SA method helped to identify 8 genes as predictors among the 2802 accessory genes. The self-attribution accuracy was 80 %. The AB_SA model was then able to classify 25 of the 29 S . enterica Typhimurium and S . enterica 1,4,[5],12:i:- isolates collected from the environment (considered to be of unknown source) into a specific category (i.e. animal source), with more than 85 % of probability. The AB_SA method herein described provides a user-friendly and valuable tool for performing source-attribution studies in only a few steps. AB_SA is written in R and freely available at https://github.com/lguillier/AB_SA.

Highlights

  • Tracing the origin of pathogenic microbial strains associated with human diseases or contamination of environmental settings is crucial for identifying targets for intervention in the food-p­ roduction chain from farm to fork

  • 98 S. enterica Typhimurium and monophasic variant strains were used in this study as input for implementing an multinomial logistic regression (MLR) model of source attribution

  • The whole accessory gene content was considered for the enrichment of genes in the animal sources

Read more

Summary

Introduction

Tracing the origin of pathogenic microbial strains associated with human diseases or contamination of environmental settings is crucial for identifying targets for intervention in the food-p­ roduction chain from farm to fork. Genetic variations in micro-­organisms are the result of different evolutionary forces. These can be prompted by either neutral processes (genetic drift) or adaptive processes, such as the emergence of a competitively advantageous mutation in a given environment. Most bacterial populations are structured, i.e. their entirety does not form a genetically homogeneous unit, but rather consists of several distinct lineages or sub-­lineages that are entirely or partially isolated from one another. Factors such as geographical isolation, combined with random phenomena such as genetic drift and sometimes with local adaptation, drive the genetic differentiation

Objectives
Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.