Abstract

Abstract The rise of the Internet has brought about a rapid growth of unstructured data recorded in the form of text and audio. Two key techniques that can be used to process text data are proposed in this study, which applies deep learning techniques to unstructured data processing. First, the transformer feature extractor is used to characterize dynamic word vectors. Then, the MCNN neural network is combined with it to perform key information screening and construct a text classification model based on the MCNN transformer. Then, the text features extracted from the BERT model are input into the VAEGRU module, combined with the self-attention mechanism and the K-Means algorithm, to construct the text clustering model based on VAE-GRU. The MCNN-transformer model achieves a high level of accuracy and Macro-F1 value that exceeds 0.880 and is superior to other text categorization models through experimental analysis. The ACC and NMI results of the VAE-GRU model are both greater than 70% on the Stack Overflow and SearchSnippets datasets and greater than 48% on the Chinese dataset are greater than 48%, and their performance is better than the three ablation models by 15.03% to 85.67%. In this paper, the MCNN-transformer model and the VAE-GRU model are capable of competent classification and clustering processing in unstructured text data, which help to improve the efficiency of information understanding and utilization of unstructured data.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.