A network-based feature extraction model for imbalanced text data

Keping Li,Dongyang Yan,Yanyan Liu,Qiaozhen Zhu

doi:10.1016/j.eswa.2022.116600

Abstract

The explosive growth of text data has attracted many researchers to explore the efficient method to extract valuable hidden information. Many technologies, especially deep learning methods, have achieved great success in text analysis. However, the most powerful methods always require a considerable quantity of data for training, which may suffer from imbalanced data in some cases. In this paper, we propose a network-based Convolution Neural Network (NCNN) to mitigate the effect of imbalanced data. The proposed model first generates new synthetic samples for the imbalanced data based on the random walking of the network. Then an extra layer called Polar Layer is introduced to connect the output from the network model of the text to the classical CNN. Two electing strategies (n-NCNN and x-NCNN) are proposed to improve the performance of NCNN further. In the experimental section, the proposed model is applied to Reuters 21578 and WebKb. By comparing with six approaches, we prove the effectiveness of the proposed NCNN model on the imbalanced text data.

Full Text