The rapid and non-destructive quantification of catechins in fermented black tea is crucial for evaluating the quality of black tea. The combination of hyperspectral imaging and chemometrics has been applied for quantitative detection, but its performance is usually constrained by the limited dataset size. Targeted at the challenge of insufficient samples in regression analysis of catechins, this study proposes an improved deep convolutional generative adversarial network with labeling module, named as DCGAN-L for hyperspectral data augmentation. The DCGAN-L consists of the spectral and label generating modules. First the synthetic spectra were generated, and an indicator was proposed to evaluate their quality. Then, the corresponding label values were generated, including epicatechin gallate (ECG), epicatechin (EGC), catechin (C), and total catechin (CC). For label generating, the Euclidean distances between the synthetic spectrum and all true spectra were measured, followed by allocating weights for calculating the label values based on these distances. Subsequently, the training dataset was augmented with the generated synthetic data. The effect of data augmentation was finally evaluated based on two regression models of random forest (RF) and broad learning system (BLS) for the quantification of catechins. Compared with the results before data augmentation, the average R2 of RF and BLS models increased by 0.044 and 0.164, respectively. The proposed DCGAN-L model allows for the rapid, non-destructive quantification of catechins in black tea in the case of limited sample size.
Read full abstract