Phrase Embedding Based Multi Document Summarization with Reduced Redundancy using Maximal Marginal Relevance

Sakkaravarthy Iyyappan K,S.R Balasundaram

doi:10.1109/iceltics50595.2020.9315474

Abstract

In the Internet Era of Information due to the exponential increase of textual data, Multi Document Summarization (MDS) is becoming an inevitable NLP task that aims to produce a concise representation of the main idea of multiple related documents. MDS becomes difficult and challenging to produce a non-redundant summary because of the lexical diversity of multiple authors. This paper proposes a new multi-document summarization system based on phrase embedding and greedy Maximal Marginal Relevance (MMR) algorithm. This approach considers phrases as the basic meaningful semantic unit of the sentences to understand and summarize documents. Embedding techniques are employed to learn the vector representation of phrases to identify similar phrases semantically. Finally, an MMR based greedy algorithm is used to select sentences with important phrases while reducing the redundancy among similar phrases. Experimental results on the benchmark dataset DUC 2004 show better performance gains compared with the state-of-the-art baselines.

Full Text