Discriminative tasks, i.e., the identification of different food materials, brands, and origins, have become an essential part of food safety control. In recent years, spectroscopic profiling combined with machine learning is becoming popular for food-related discriminative tasks, but finding an appropriate classification model can be challenging. Compared to the current "trial-and-error" practice, this paper proposes a dedicated two-step classifiability analysis framework to address this issue. The first step collects more than 90 diversified metrics to measure the dataset separability from different perspectives. The second step synthesizes these metrics into a quantitative score using meta-learner and decomposition-based strategies. Finally, two Raman spectroscopic profiling case studies were conducted to validate the method, demonstrating higher scores for the easily separable liquor dataset (around 1.0) compared to the more challenging table salt dataset (< 0.5). This score can guide researchers to determine the required model complexity and assess the adequacy of the current physio-chemical profiling instrument. We expected the classifiability analysis framework proposed in this research to be generalized to a wide range of machine learning applications within the realm of food, where data-driven classification or discriminative tasks are involved.
Read full abstract