Abstract

This paper reports on the further results of the ongoing research analyzing the impact of a range of commonly used statistical and semantic features in the context of extractive text summarization. The features experimented with include word frequency, inverse sentence and term frequencies, stopwords filtering, word senses, resolved anaphora and textual entailment. The obtained results demonstrate the relative importance of each feature and the limitations of the tools available. It has been shown that the inverse sentence frequency combined with the term frequency yields almost the same results as the latter combined with stopwords filtering that in its turn proved to be a highly competitive baseline. To improve the suboptimal results of anaphora resolution, the system was extended with the second anaphora resolution module. The present paper also describes the first attempts of the internal document data representation.

Highlights

  • The research in extractive Text Summarization (TS) covers a wide range of features that are used to determine the most salient text segments to include them into the final summary

  • Combining inverse sentence frequency (ISF) with anaphora resolution (AR), Word sense disambiguation (WSD) and Textual Entailment (TE) decreases the quality of generated summaries

  • The obtained results have shown that sematic-based methods involving anaphora resolution, textual entailment and word sense disambiguation benefit the redundancy detection stage

Read more

Summary

Introduction

The research in extractive Text Summarization (TS) covers a wide range of features that are used to determine the most salient text segments to include them into the final summary. Different approaches select different features and methods, starting from the very basic ones like term frequency [1], position of the sentence within the original document [2, 3], assigning higher weights to the sentences containing terms of the title [2] and inverse sentence frequency [4]; or more complex ones including word sense disambiguation [5], latent semantic analysis and anaphora resolution [6], textual entailment [7]. The aim of present research is to assess the relative importance of a set of different features and their impact on the process of extractive summarization generation. The inspected set of features and methods include term frequency, inverse term and sentence frequencies, word sense disambiguation, anaphora resolution, textual entailment recognition and corpustailored stopwords. BART coreference resolution tool [9] was integrated to compare the results with the Java RAP results

Objectives
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.