Abstract

In the Internet Era of Information due to the exponential increase of textual data, Multi Document Summarization (MDS) is becoming an inevitable NLP task that aims to produce a concise representation of the main idea of multiple related documents. MDS becomes difficult and challenging to produce a non-redundant summary because of the lexical diversity of multiple authors. This paper proposes a new multi-document summarization system based on phrase embedding and greedy Maximal Marginal Relevance (MMR) algorithm. This approach considers phrases as the basic meaningful semantic unit of the sentences to understand and summarize documents. Embedding techniques are employed to learn the vector representation of phrases to identify similar phrases semantically. Finally, an MMR based greedy algorithm is used to select sentences with important phrases while reducing the redundancy among similar phrases. Experimental results on the benchmark dataset DUC 2004 show better performance gains compared with the state-of-the-art baselines.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.