Abstract
Recent studies have shown that predictive models can supplement or provide alternatives to E. coli-testing for assessing the potential presence of food safety hazards in water used for produce production. However, these studies used balanced training data and focused on enteric pathogens. As such, research is needed to determine 1) if predictive models can be used to assess Listeria contamination of agricultural water, and 2) how resampling (to deal with imbalanced data) affects performance of these models. To address these knowledge gaps, this study developed models that predict nonpathogenic Listeria spp. (excluding L. monocytogenes) and L. monocytogenes presence in agricultural water using various combinations of learner (e.g., random forest, regression), feature type, and resampling method (none, oversampling, SMOTE). Four feature types were used in model training: microbial, physicochemical, spatial, and weather. “Full models” were trained using all four feature types, while “nested models” used between one and three types. In total, 45 full (15 learners*3 resampling approaches) and 108 nested (5 learners*9 feature sets*3 resampling approaches) models were trained per outcome. Model performance was compared against baseline models where E. coli concentration was the sole predictor. Overall, the machine learning models outperformed the baseline E. coli models, with random forests outperforming models built using other learners (e.g., rule-based learners). Resampling produced more accurate models than not resampling, with SMOTE models outperforming, on average, oversampling models. Regardless of resampling method, spatial and physicochemical water quality features drove accurate predictions for the nonpathogenic Listeria spp. and L. monocytogenes models, respectively. Overall, these findings 1) illustrate the need for alternatives to existing E. coli-based monitoring programs for assessing agricultural water for the presence of potential food safety hazards, and 2) suggest that predictive models may be one such alternative. Moreover, these findings provide a conceptual framework for how such models can be developed in the future with the ultimate aim of developing models that can be integrated into on-farm risk management programs. For example, future studies should consider using random forest learners, SMOTE resampling, and spatial features to develop models to predict the presence of foodborne pathogens, such as L. monocytogenes, in agricultural water when the training data is imbalanced.
Highlights
Given the number of high-profile, multistate outbreaks linked to fresh produce over the last two decades, preharvest produce safety is of increasing concern to government and industry stakeholders as well as consumers (Newell et al, 2010; Zhu et al, 2017)
While we acknowledge that Salmonella and pathogenic E. coli are the primary organisms of concern in surface water used for produce production, Listeria spp. and L. monocytogenes were used as models organisms here because 1) we lacked access to suitable data on Salmonella and pathogenic E. coli contamination of agricultural waterways, and 2) L. monocytogenes is a foodborne pathogen of concern whose presence in agricultural water could lead to recalls and illness when contamination carries through to the finished product (Garner and Kathariou, 2016)
L. monocytogenes is a foodborne pathogen of concern and L. monocytogenes contamination of agricultural water may lead to human illness
Summary
Given the number of high-profile, multistate outbreaks linked to fresh produce over the last two decades, preharvest produce safety is of increasing concern to government and industry stakeholders as well as consumers (Newell et al, 2010; Zhu et al, 2017). There is a clear need for alternative strategies for identifying produce safety hazards in surface waterways that provide water for produce production
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.