In this study, a simple and accurate approach is proposed for enhancing the origin identification of raspberry samples using a combination of innovative Raman spectral preprocessing techniques, feature selection, and machine learning algorithms. Window function was creatively introduced and combined with baseline removal technique to preprocess the Raman spectral data, reducing the dimensionality of the raw data and ensuring the quality of the processed data. An optimization process was conducted to determine the optimal parameter for the window function, resulting in a binning window width of 5 that yielded the highest accuracy. After applying three feature selection techniques, it was found that the information gain model had the best performance in extracting discriminative spectral features. Finally, ten different machine learning algorithms were employed to construct predictive models, and the optimal models were selected. Linear Support Vector Classifier (LinearSVC), Multi-Layer Perceptron Classifier (MLPClassifier), and Linear Discriminant Analysis (LDA) achieve accuracy, precision, recall, and F1 values above 0.96, while the Random Vector Functional Link Network Classifier (RVFLClassifier) surpasses 0.93 for these performance metrics. These results demonstrate the effectiveness of the proposed approach in identifying the origin of raspberry samples with high accuracy and robustness, providing a valuable tool for agricultural product authentication and quality control.
Read full abstract