A weighted word embedding based approach for extractive text summarization

Ruby Rani,Daya K Lobiyal

doi:10.1016/j.eswa.2021.115867

Abstract

Automatic text summarization (ATS) is a method to condense a long size text document into abridging form by enveloping all the primary information and central theme. Numerous ATS models have already prospected in this direction. However, many of those do not capture the semantic features and latent meanings of the text documents. In this paper, we present a weighted word vector representation method concerning TF-IDF for ATS. The proposed model is a prospective method for huge data on the internet that can catch all possible semantic meanings from the text along with the statistical and linguistic features. The proposed word vectors help to strengthen the diversity of the generated summary by discriminating semantically dissimilar sentences. Besides, we evaluate the proposed model on news articles taken from DUC 2007 dataset using the ROUGE summary evaluation metric. Moreover, we compare the proposed model against the four state-of-the-art summarization models and observe that our proposed approach outperforms among all the baselines and able to produce coherent, meaningful, diverse, and least redundant summaries.

Full Text