An Adaptive Normalized Google Distance Similarity Measure for Extractive Text Summarization

Albaraa Abuobieda,Ahmed Hamza Osman

doi:10.1109/iccis49240.2020.9257668

Abstract

No doubt, for each clustering algorithm running improper similarity calculation, that can lead to reduce the clustering accuracy. Hence, several applications that employ such algorithm are affected negatively and generate improper results. In a previous work, we found that employing the Normalized Google Distance (NGD) similarity measure to cluster document's sentences for text summarization problem is unreasonable; since NGD was basically designed to work with large databases. On the other hand, a term-weighting approach is used widely to define document's contents. In this paper, a term-weighting approach is integrated with the NGD similarity measure to adopt the latter from being able to work in small database (single document). Differential Evolution (DE) algorithm is used to train and test the proposed method. The DUC2002 dataset is preprocessed and used as a test bed. The results showed that our proposed method could outperform the previous work in terms of F-score evaluation measure as well as outperformed the standard baseline methods Microsoft Word and Copernic Summarizer.

Full Text