Rare words in text summarization

Danila Morozovskii,Sheela Ramanna

doi:10.1016/j.nlp.2023.100014

Abstract

Automatic text summarization is a difficult task, which involves a good understanding of an input text to produce fluent, brief and vast summary. The usage of text summarization models can vary from legal document summarization to news summarization. The model should be able to understand where important information is located to produce a good summary. However, infrequently used or rare words might limit model’s understanding of an input text, as the model might ignore such words or put less attention on them, especially, when the trained model is tested on the dataset, where the frequency distribution of used words is different. Another issue is that the model accepts only a limited amount of tokens (words) of an input text, which might contain redundant information or not including important information as it is located further in the text. To address the problem of rare words, we have proposed a modification to the attention mechanism of the transformer model with pointer-generator layer, where attention mechanism receives frequency information for each word, which helps to boost rare words. Additionally, our proposed supervised learning model uses the hybrid approach incorporating both extractive and abstractive elements, to include more important information for the abstractive model in a news summarization task. We have designed experiments involving a combination of six different hybrid models with varying input text sizes (measured as tokens) to test our proposed model. Four well-known datasets specific to news articles were used in this work: CNN/DM, XSum, Gigaword and DUC 2004 Task 1. Our results were compared using the well-known ROUGE metric. Our best model achieved R-1 score of 38.22, R-2 score of 15.07 and R-L score of 35.79, outperforming three existing models by several ROUGE points.

Full Text