Abstract

Text summarization refers to the process that generates a shorter form of text from the source document preserving salient information. Many existing works for text summarization are generally evaluated by using recall-oriented understudy for gisting evaluation (ROUGE) scores. However, as ROUGE scores are computed based on n-gram overlap, they do not reflect semantic meaning correspondences between generated and reference summaries. Because Korean is an agglutinative language that combines various morphemes into a word that express several meanings, ROUGE is not suitable for Korean summarization. In this paper, we propose evaluation metrics that reflect semantic meanings of a reference summary and the original document, Reference and Document Aware Semantic Score (RDASS). We then propose a method for improving the correlation of the metrics with human judgment. Evaluation results show that the correlation with human judgment is significantly higher for our evaluation metrics than for ROUGE scores.

Highlights

  • The task of text summarization is to generate a reference summary that conveys all the salient information of an original document

  • We report the correlation of the proposed evaluation metrics to recall-oriented understudy for gisting evaluation (ROUGE) to show that the proposed methods complement ROUGE

  • Among the proposed evaluation metrics, s(p, r) showed higher performance than s(p, d) and Reference and Document Aware Semantic Score (RDASS) showed the highest correlation with human judgment. These results indicate that the proposed evaluation metrics can reflect deep semantic meaning overcoming the limitations of ROUGE which based on n-gram overlap

Read more

Summary

Introduction

The task of text summarization is to generate a reference summary that conveys all the salient information of an original document. The most noticeable key sentences are extracted from the source and compiled into a reference (Zhong et al, 2019; Wang et al, 2019; Xiao and Carenini, 2019). The second approach is abstractive, with which a paraphrased summary is generated from the source (Zhang et al, 2018; Guo et al, 2018; Wenbo et al, 2019). The generated summary may not contain the same words that appear in the source document. Measuring factual alignment between the generated summary and source document is important (Kryscinski et al, 2019)

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.