A Long-Text Classification Method of Chinese News Based on BERT and CNN

Xinying Chen,Peimin Cong,Shuo Lv

doi:10.1109/access.2022.3162614

Xinying Chen, Peimin Cong + Show 1 more

Open Access

https://doi.org/10.1109/access.2022.3162614

Copy DOI

Journal: IEEE Access	Publication Date: Jan 1, 2022
Citations: 58	License type: CC BY-NC-ND 4.0

Affiliation: Dalian Jiaotong University

Abstract

Text Classification is an important research area in natural language processing (NLP) that has received a considerable amount of scholarly attention in recent years. However, real Chinese online news is characterized by long text, a large amount of information and complex structure, which also reduces the accuracy of Chinese long text classification as a result. To improve the accuracy of long text classification of Chinese news, we propose a BERT-based local feature convolutional network (LFCN) model including four novel modules. First, to address the limitation of Bidirectional Encoder Representations from Transformers (BERT) on the length of the max input sequence, we propose a named Dynamic LEAD-n (DLn) method to extract short texts within the long text based on the traditional LEAD digest algorithm. In Text-Text Encoder (TTE) module, we use BERT pretrained language model to complete the sentence-level feature vector representation of a news text and to capture global features by using the attention mechanism to identify correlated words in text. After that, we propose a CNN-based local feature convolution (LFC) module to capture local features in text, such as key phrases. Finally, the feature vectors generated by the different operations over several different periods are fused and used to predict the category of a news text. Experimental results show that the new method further improves the accuracy of long text classification of Chinese news.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A Long-Text Classification Method of Chinese News Based on BERT and CNN

Abstract

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

Bidirectional encoders to state-of-the-art: a review of BERT and its transformative impact on natural language processing
Rajesh Gupta
Информатика. Экономика. Управление - Informatics. Economics. Management | VOL. 3
Rajesh GuptaRajesh Gupta
02 Mar 2024
Информатика. Экономика. Управление - Informatics. Economics. Management | VOL. 3

Improving Multi-model Hybrid Chinese Long-text Classification through BERT Optimisation
Yu Wang ... Yunni Xia
-
Yu Wang, et. al.Yu Wang ... Yunni Xia
15 Dec 2022
15 Dec 2022

Use of Bidirectional Encoder Representations from Transformers (BERT) and Robustly Optimized Bert Pretraining Approach (RoBERTa) for Nepali News Classification
Kriti Nemkul
Tribhuvan University Journal | VOL. 39
Kriti NemkulKriti Nemkul
20 Jun 2024
Tribhuvan University Journal | VOL. 39

Bert model fine-tuning for text classification in knee OA radiology reports
L Chen ... V Pedoia
Osteoarthritis and Cartilage | VOL. 28
L Chen, et. al.L Chen ... V Pedoia
01 Apr 2020
Osteoarthritis and Cartilage | VOL. 28

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Long-Text Classification Method of Chinese News Based on BERT and CNN

Abstract

Talk to us

Similar Papers

More From: IEEE Access