A Text Abstraction Summary Model Based on BERT Word Embedding and Reinforcement Learning

Qicai Wang,Zhenfang Zhu,Hongxia Yin,Lindong Zhang,Qiuyue Zhang,Peiyu Liu

doi:10.3390/app9214701

Abstract

As a core task of natural language processing and information retrieval, automatic text summarization is widely applied in many fields. There are two existing methods for text summarization task at present: abstractive and extractive. On this basis we propose a novel hybrid model of extractive-abstractive to combine BERT (Bidirectional Encoder Representations from Transformers) word embedding with reinforcement learning. Firstly, we convert the human-written abstractive summaries to the ground truth labels. Secondly, we use BERT word embedding as text representation and pre-train two sub-models respectively. Finally, the extraction network and the abstraction network are bridged by reinforcement learning. To verify the performance of the model, we compare it with the current popular automatic text summary model on the CNN/Daily Mail dataset, and use the ROUGE (Recall-Oriented Understudy for Gisting Evaluation) metrics as the evaluation method. Extensive experimental results show that the accuracy of the model is improved obviously.

Highlights

Text summarization is a task of compressing long text into short one keeping up the central idea
400the tokens of thenetwork vanilla input different amountsamounts of data, of when article tokens inputthe of input the vanilla network is truncated, which causes loss of information, the input of the new network is the key is truncated, which causes loss of information, the input of the new network is the key sentences sentences from aforementioned model;the fourth, word embedding of twoare models are from aforementioned extractionextraction model; fourth, wordthe embedding of two models different, different, the word2vec is used for the vanilla pointer-generator network, the is used in our the word2vec is used for the vanilla pointer-generator network, the BERT is used in our abstractive abstractive addition, thetokenizer
BERT has achieved the most advanced performance in many natural language processing (NLP) tasks, but few works combine it with the extract model and abstract model for text summarization by the strategy gradient of reinforcement learning

Summary

Introduction

Text summarization is a task of compressing long text into short one keeping up the central idea. When summarizing a very long text, the extractive approach is too simple and the readability is poor, and the abstract method of compressing a long input sequence with a single fixed-length vector may cause the information loss, neither of them could perform the long text summary better. Proposed a new model for the long text summary, which abstracts the summary by using a deep communication agent, first of all, dividing the long input text into multiple agents encoders, and generating the summary through a unified decoder These methods have achieved good results, due to the limitation of specific data sets and the small amount of data, their word embedding effect is not obvious and the semantic features of a text cannot be fully obtained.

Related Works

Problems

Experiments thatasthe

Extraction

Abstraction

Training Procedure

Pre-Training

End-to-End Training

Reinforcement Learning

Datasets

Summary length

Detail

Metrics

Baselines

Result and Analysis

Result

Performance comparison withrespect respect to to the the best

REVIEW x FOR PEER

Generalization

Redundancy Issue

Training Speed

Case Study

Findings

Conclusions and Future Work

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Applied Sciences	Publication Date: Nov 4, 2019
Citations: 52	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

A Text Abstraction Summary Model Based on BERT Word Embedding and Reinforcement Learning

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Applied Sciences

Lead the way for us

Similar Papers

Learned Text Representation for Amharic Information Retrieval and Natural Language Processing
Tilahun Yeshambel ... Josiane Mothe
Information | VOL. 14
Tilahun Yeshambel, et. al.Tilahun Yeshambel ... Josiane Mothe
20 Mar 2023
Information | VOL. 14

Engineering Document Summarization Using Sentence Representations Generated by Bidirectional Language Model
Yan Jin ... Yunjian Qiu
-
Yan Jin, et. al.Yan Jin ... Yunjian Qiu
17 Aug 2021
17 Aug 2021

Cross2Self-attentive Bidirectional Recurrent Neural Network with BERT for Biomedical Semantic Text Similarity
Zhengguang Li ... Hongfei Lin
-
Zhengguang Li, et. al.Zhengguang Li ... Hongfei Lin
16 Dec 2020
16 Dec 2020

Bidirectional encoders to state-of-the-art: a review of BERT and its transformative impact on natural language processing
Rajesh Gupta
Информатика. Экономика. Управление - Informatics. Economics. Management | VOL. 3
Rajesh GuptaRajesh Gupta
02 Mar 2024
Информатика. Экономика. Управление - Informatics. Economics. Management | VOL. 3

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Text Abstraction Summary Model Based on BERT Word Embedding and Reinforcement Learning

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Applied Sciences