In order to solve the problems of the small capacity of structured data and uneven distribution among classes in machine learning tasks, a supervised generation method for structured data called WAGAN and a cyclic sampling method named SACS (Semi-supervised and Active-learning Cyclic Sampling), based on semi-supervised active learning, are proposed. The loss function and neural network structure are optimized, and the quantity and quality of the small sample set are enhanced. To enhance the reliability of generating pseudo-labels, a Semi-supervised Active learning Framework (SAF) is designed. This framework redistributes class labels to samples, which not only enhances the reliability of generated samples but also reduces the influence of noise and uncertainty on the generation of false labels. To mine the diversity information of generated samples, an uncertain sampling strategy based on spatial overlap is designed. This strategy incorporates the idea of spatial overlap and uses global and local sampling methods to calculate the information content of generated samples. Experimental results show that the proposed method performs better than other data enhancement methods on three different datasets. Compared to the original data, the average F1macro value of the classification model is improved by 11.5%, 16.1%, and 19.6% relative to compared methods.
Read full abstract