Antibiotic resistance is one of the major concerns in veterinary and human medicine and poses a considerable threat to both human and animal health. It has been shown that over- or misuse of antibiotics is one of the primary drivers of antibiotic resistance. To develop the surveillance of antibiotic use, Switzerland introduced the "Informationssystem Antibiotika in der Veterinärmedizin" (IS ABV) in 2019, mandating electronic registration of antibiotic prescriptions by all veterinarians in Switzerland. However, initial data analysis revealed a considerable amount of implausible data entries, potentially compromising data quality and reliability. These anomalies may be caused by input errors, inaccuracies, incorrect or aberrant master data or data transmission and make analysis impossible. To address this issue efficiently, we propose a two-stage anomaly detection framework utilizing machine learning algorithms. In this study, our primary focus was on cattle treatments with either single or group therapy, as they were the species with the highest prescription volume. However, not all outliers are necessarily incorrect; some may be legitimate but unusual antibiotic treatments. Thus, expert review plays a crucial role in distinguishing outliers, that are correct from actual errors. Initially, relevant prescription variables were extracted and pre-processed with a custom-built scaler. A set of unsupervised algorithms calculated the probability of each data point and identified the most likely outliers. In collaboration with experts, we annotated anomalies and established anomaly thresholds for each production type and active substance. These expert-annotated labels were then used to fine-tune the final supervised classification algorithms. With this methodology, we identified 22,816 anomalies from a total of 1,994,170 prescriptions in cattle (1.1 %). Cattle with no further specified production type had the most (2 %) anomalies with 7758 out of 379,995. The anomalies were consistently identified and comprised prescriptions with too high and too low dosages. Random Forest achieved a ROC-AUC score of 0.994, (95 % CI: 0.992, 0.995) and a F1-Score of 0.962 (95 % CI: 0.958, 0.966) for single treatments. The versatility of this framework allows its adaptation to other species within IS ABV and potentially to other prescription-based surveillance systems. If applied regularly to uploaded prescriptions, it should reduce input errors over time, improving the validity of the data in the long term.
Read full abstract