Abstract

Automatic text summarization is currently a topic of great interest in many knowledge fields. Extractive multi-document text summarization methods aim to reduce the textual information from a document collection by covering the main content and reducing the redundant information. In the scientific literature, there are different approaches related to term-weighting schemes and similarity measures, which are necessary for implementing an automatic summary system. However, to the best of the authors’ knowledge, there are no studies to analyze the performance of the different schemes and measures. In this paper, all possible combinations of the most common term-weighting schemes and similarity measures used in the extractive multi-document text summarization field have been implemented, compared, and analyzed. Experiments have been performed with Document Understanding Conferences (DUC) datasets, and the model performance has been assessed with eight Recall-Oriented Understudy for Gisting Evaluation (ROUGE) metrics and the execution time. Results show that the best term-weighting scheme is the term-frequency inverse-sentence-frequency scheme, and the best similarity measure is the cosine similarity. Even more, the combination formed by both of them has obtained the best average results in 87.5% of ROUGE scores compared to the other combinations.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.