Research on Multi-label Text Classification Method Based on tALBERT-CNN

Wenfu Liu,Jianmin Pang,Nan Li,Feng Yue,Xin Zhou

doi:10.1007/s44196-021-00055-4

Abstract

Single-label classification technology has difficulty meeting the needs of text classification, and multi-label text classification has become an important research issue in natural language processing (NLP). Extracting semantic features from different levels and granularities of text is a basic and key task in multi-label text classification research. A topic model is an effective method for the automatic organization and induction of text information. It can reveal the latent semantics of documents and analyze the topics contained in massive information. Therefore, this paper proposes a multi-label text classification method based on tALBERT-CNN: an LDA topic model and ALBERT model are used to obtain the topic vector and semantic context vector of each word (document), a certain fusion mechanism is adopted to obtain in-depth topic and semantic representations of the document, and the multi-label features of the text are extracted through the TextCNN model to train a multi-label classifier. The experimental results obtained on standard datasets show that the proposed method can extract multi-label features from documents, and its performance is better than that of the existing state-of-the-art multi-label text classification algorithms.

Highlights

Automatic text classification is an important means for humans to process massive amounts of text information
The multi-label text classification method proposed in this paper is fundamentally composed of two parts: deep topic and semantic representation based on topic ALBERT (tALBERT) and multi-label feature learning based on a convolution neural network (CNN)
5, 6, we can find that our method is obviously superior to the latent Dirichlet allocation (LDA) topic model due to its use of probability feature statistics and the deep semantic model A Lite BERT” (ALBERT). This fully shows that the combination of a topic model and deep semantic model significantly improves natural language processing (NLP) downstream tasks performance, which is consistent with the conclusion of reference [9]

Summary

Introduction

Automatic text classification is an important means for humans to process massive amounts of text information. Due to complex and changeable text data environments and the existence of polysemous objects, text classification face many severe challenges. The traditional single-label text classification method has not fully met the needs of users. To better meet the needs of users for text classification tasks, the multi-label learning method came into being [1]. Multi-label learning refers to the process of assigning the most relevant subset of class labels to each instance from the overall label set, thereby intuitively reflecting the various semantic information contents of ambiguous objects.

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: International Journal of Computational Intelligence Systems	Publication Date: Dec 1, 2021
Citations: 13	License type: open-access

R Discovery Prime

R Discovery Prime

Research on Multi-label Text Classification Method Based on tALBERT-CNN

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal of Computational Intelligence Systems

Lead the way for us

Similar Papers

Multi-label Classification for Clinical Text with Feature-level Attention
Disheng Pan ... Mengya Li
-
Disheng Pan, et. al.Disheng Pan ... Mengya Li
01 May 2020
01 May 2020

Research on multi-label long text classification algorithm based on transformer-LDA
Mingjie Tang ... Yuanchang Zhong
-
Mingjie Tang, et. al.Mingjie Tang ... Yuanchang Zhong
29 Mar 2023
29 Mar 2023

A Proposed Arabic Text Classification Model using Multi-Label System
Hussain A Rahmana ... Salwa S Baawi
Journal of Al-Qadisiyah for Computer Science and Mathematics | VOL. 15
Hussain A Rahmana, et. al.Hussain A Rahmana ... Salwa S Baawi
30 Sep 2023
Journal of Al-Qadisiyah for Computer Science and Mathematics | VOL. 15

Research on Multi-Label Text Classification Based on Multi-Channel CNN and BiLSTM
Shoujin Wang ... Yuanjiao Yang
-
Shoujin Wang, et. al.Shoujin Wang ... Yuanjiao Yang
01 Oct 2022
01 Oct 2022

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Research on Multi-label Text Classification Method Based on tALBERT-CNN

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal of Computational Intelligence Systems