Prediction of Listeria monocytogenes Clonal Complexes from Multilocus Variable Number Tandem Repeat Analysis Patterns Using a Machine Learning Approach.

Nicholas Andrews,Natalia Unrath,Patrick Wall,James F Buckley,Séamus Fanning

doi:10.1089/fpd.2023.0163

Abstract

Multilocus variable number tandem repeat analysis (MLVA) is a molecular subtyping technique that remains useful for those without the resources to access whole genome sequencing for the tracking and tracing of bacterial contaminants. Unlike techniques such as multilocus sequence typing (MLST) and pulsed-field gel electrophoresis, MLVA did not emerge as a standardized subtyping method for Listeria monocytogenes, and as a result, there is no reference database of virulent or food-associated MLVA subtypes as there is for MLST-based clonal complexes (CCs). Having previously shown the close congruence of a 5-loci MLVA scheme with MLST, a predictive model was created using the XGBoost machine learning (ML) technique, which enabled the prediction of CCs from MLVA patterns with ∼85% (±4%) accuracy. As well as validating the model on existing data, a straightforward update protocol was simulated for if and when previously unseen subtypes might arise. This article illustrates how ML techniques can be applied with elementary coding skills to add value to previous-generation molecular subtyping data in-built food processing environments.

Full Text