With the rise of deep learning, the combination of spectroscopy analysis techniques and deep learning methods has been extensively utilized in the field of agriculture, such as the detection of crop diseases, soil analysis, and crop quality assessment. Unfortunately, it is often difficult to obtain enough spectral samples that can be used for deep learning because of environmental constraints, equipment limitations, and labour costs. To address this issue, we propose a spectral sample augmentation technique based on the K-condition boundary equilibrium generative adversarial networks (KC-BEGAN). First, a stable boundary equilibrium generative adversarial network (BEGAN) model is constructed, and the KC-BEGAN model is built by incorporating chemical property labels, multiscale gradient information, and the k-nearest neighbour algorithm. The goal is to enrich complete spectral samples with chemical properties. Second, we compare the differences between generated samples and real samples using methods such as t-distributed stochastic neighbour embedding, Mahalanobis distance, F test, and maximum mean discrepancy. Following data augmentation by the KC-BEGAN model, the R2 on the test set for traditional regression models (PLSR, SVR, RR, PCR) and deep regression models (Inception-ResNet, Inception, 1D-CNN) improved by 4.9%, 0.4%, 2.5%, 4.4%, 3.6%, 4.4%, and 6.8% respectively. Furthermore, this study replaces the GAN module in the KC-BEGAN model with Diffusion-GAN and conducts experiments following the same procedure to evaluate the feasibility of other GAN models in augmenting labelled near-infrared spectral samples. The research results indicate that the spectral samples generated by the KC-BEGAN model are reliable and can meet the expansion requirements of small-scale spectral sample sets. Simultaneously, replacing the GAN module in the KC-BEGAN model with other GAN models is also feasible, suggesting that the continuous label marking method proposed in this study is indeed effective.
Read full abstract