Abstract

Single-label classification technology has difficulty meeting the needs of text classification, and multi-label text classification has become an important research issue in natural language processing (NLP). Extracting semantic features from different levels and granularities of text is a basic and key task in multi-label text classification research. A topic model is an effective method for the automatic organization and induction of text information. It can reveal the latent semantics of documents and analyze the topics contained in massive information. Therefore, this paper proposes a multi-label text classification method based on tALBERT-CNN: an LDA topic model and ALBERT model are used to obtain the topic vector and semantic context vector of each word (document), a certain fusion mechanism is adopted to obtain in-depth topic and semantic representations of the document, and the multi-label features of the text are extracted through the TextCNN model to train a multi-label classifier. The experimental results obtained on standard datasets show that the proposed method can extract multi-label features from documents, and its performance is better than that of the existing state-of-the-art multi-label text classification algorithms.

Highlights

  • Automatic text classification is an important means for humans to process massive amounts of text information

  • The multi-label text classification method proposed in this paper is fundamentally composed of two parts: deep topic and semantic representation based on topic ALBERT (tALBERT) and multi-label feature learning based on a convolution neural network (CNN)

  • 5, 6, we can find that our method is obviously superior to the latent Dirichlet allocation (LDA) topic model due to its use of probability feature statistics and the deep semantic model A Lite BERT” (ALBERT). This fully shows that the combination of a topic model and deep semantic model significantly improves natural language processing (NLP) downstream tasks performance, which is consistent with the conclusion of reference [9]

Read more

Summary

Introduction

Automatic text classification is an important means for humans to process massive amounts of text information. Due to complex and changeable text data environments and the existence of polysemous objects, text classification face many severe challenges. The traditional single-label text classification method has not fully met the needs of users. To better meet the needs of users for text classification tasks, the multi-label learning method came into being [1]. Multi-label learning refers to the process of assigning the most relevant subset of class labels to each instance from the overall label set, thereby intuitively reflecting the various semantic information contents of ambiguous objects.

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.