Abstract

Existing text summarization methods mainly rely on the mapping between manually labeled standard summaries and the original text for feature extraction, often ignoring the internal structure and semantic feature information of the original document. Therefore, the text summary extracted by the existing model has the problems of grammatical structure errors and semantic deviation from the original text. This paper attempts to enhance the model’s attention to the inherent feature information of the source text so that the model can more accurately identify the grammatical structure and semantic information of the document. Therefore, this paper proposes a model based on the multi-head self-attention mechanism and the soft attention mechanism. By introducing an improved multi-head self-attention mechanism in the model coding stage, the training model enables the correct summary syntax and semantic information to obtain higher weight, thereby making the generated summary more coherent and accurate. At the same time, the pointer network model is adopted, and the coverage mechanism is improved to solve out-of-vocabulary and repetitive problems when generating abstracts. This article uses CNN/DailyMail dataset to verify the model proposed in this article and uses the ROUGE indicator to evaluate the model. The experimental results show that the model in this article improves the quality of the generated summary compared with other models.

Highlights

  • The internet generates a large quantity of text data at all times, and the problem of text information overload is becoming increasingly serious

  • Automatic text summarization extracts a paragraph of content from the original text or generates a paragraph of new content to summarize the main information of the original text

  • In the process of using the sequence-to-sequence model, the researchers found that the model can extract information from the original text, but the text summary generated by the model has out-of-vocabulary and word repetition problems

Read more

Summary

Introduction

The internet generates a large quantity of text data at all times, and the problem of text information overload is becoming increasingly serious. The pointer generation network uses the traditional soft attention mechanism, which cannot extract the various semantic and grammatical information inside the original text, resulting in grammatical structure error and semantic deviation problems from the original text in the generated abstract.

Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call