Abstract
Text summarization is an important task in natural language processing (NLP). Neural summary models summarize information by understanding and rewriting documents through the encoder-decoder structure. Recent studies have sought to overcome the bias that cross-entropy-based learning methods can have through reinforcement learning (RL)-based learning methods or the problem of failing to learn optimized for metrics. However, the ROUGE metric with only <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$n$ </tex-math></inline-formula> -gram matching is not a perfect solution. The purpose of this study is to improve the quality of the summary statement by proposing a reward function used in text summarization based on RL. We propose ROUGE-SIM and ROUGE-WMD, modified functions of the ROUGE function. ROUGE-SIM enables meaningfully similar words, in contrast to ROUGE-L. ROUGE-WMD is a function adding semantic similarity to ROUGE-L. The semantic similarity between articles and summary text was computed using Word Mover’s Distance (WMD) methodology. Our model with two proposed reward functions demonstrated superior performance on ROUGE-1, ROUGE-2, and ROUGE_L than on ROUGE-L as a reward function. Our two models, ROUGE-SIM and ROUGE-WMD, scored 0.418 and 0.406 for ROUGE-L, respectively, for the Gigaword dataset. The two reward functions outperformed ROUGE-L even in the abstractiveness and grammatical aspects.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Similar Papers
More From: IEEE Access
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.