Historical data on food safety monitoring often serve as an information source in designing monitoring plans. However, such data are often unbalanced: a small fraction of the dataset refers to food safety hazards that are present in high concentrations (representing commodity batches with a high risk of being contaminated, the positives) and a high fraction of the dataset refers to food safety hazards that are present in low concentrations (representing commodity batches with a low risk of being contaminated, the negatives). Such unbalanced datasets complicate modeling to predict the probability of contamination of commodity batches. This study proposes a weighted Bayesian network (WBN) classifier to improve the model prediction accuracy for the presence of food and feed safety hazards using unbalanced monitoring data, specifically for the presence of heavy metals in feed. Applying different weight values resulted in different classification accuracies for each involved class; the optimal weight value was defined as the value that yielded the most effective monitoring plan, that is, identifying the highest percentage of contaminated feed batches. Results showed that the Bayesian network classifier resulted in a large difference between the classification accuracy of positive samples (20%) and negative samples (99%). With the WBN approach, the classification accuracy of positive samples and negative samples were both around 80%, and the monitoring effectiveness increased from 31% to 80% for pre-set sample size of 3000. Results of this study can be used to improve the effectiveness of monitoring various food safety hazards in food and feed.
Read full abstract