Jujube is an important economic crop in Xinjiang, China, and its related industry provides abundant employment opportunities and contributes positively to local economic development. Due to the high economic value and similar appearance of jujubes from different regions, there is a phenomenon of counterfeit products in the market. Therefore, it is necessary to trace the origin of jujubes to address these issues. In this study, we conducted traceability research on jujubes from four locations: Alaer, Hotan, Ruoqiang, and Zhangye. Near-infrared spectroscopy was used to extract non-destructive data from the jujubes for analysis. The extracted data was transformed into one-dimensional data, and six algorithms, including the Back Propagation Neural Network (BP), Radial Basis Function Neural Network(RBF), Convolutional Neural Networks (CNN), Long Short-Term Memory Network(LSTM), Support Vector Machine (SVM), and Random Forest (RF), were used for classification research on the obtained 4000 sets of one-dimensional data. The results showed that when sufficient data were available, the RBF, LSTM and CNN perform relatively better best with 93.50 %, 94.33 % and 94.25 % accuracy. The RF achieved an accuracy of 92.42 % and demonstrated good detection efficiency. However, the BP and SVM performed relatively poorly, with accuracies of 90.42 % and 80.42 % respectively. Considering the time-consuming nature of data collection for large datasets during the detection process, several sets of one-dimensional data were randomly selected for further experiments, through which 400 sets of data were found to be more prominent with 700 sets of data. The final experimental results demonstrate that, in the traceability detection of a small-scale dataset containing 400 instances, the CNN exhibits the best performance, achieving an accuracy of 86.67 %. The performance of the LSTM is slightly lower than that of the CNN, with an accuracy of 84.12 %. The accuracies of the BP, RBF, SVM, and RF were relatively lower at 82.5 %, 82.5 %, 74.12 %, and 74.18 % respectively. In the experiments with a dataset of 700 instances, the performance of CNN stands out prominently. In comparison to the CNN algorithm on the 400-instance dataset and the second-best LSTM algorithm, the accuracy, precision, recall, and F1-score have exhibited significant improvements by 53.13 %, 98.98 %, 481.43 %, and 140.36 %, respectively. The corresponding values are 0.9043, 0.9083, 0.9052, and 0.906. This study demonstrates the non-destructive traceability of jujube fruits using near-infrared spectroscopy data combined with neural network algorithms. Additionally, it reduces the need for additional preprocessing steps in data handling and achieves promising detection results. This research has positive implications for the traceability of jujube fruits and related crops.
Read full abstract