Abstract

Multi-document news summarization (MDNS) aims to create a condensed summary while retaining the main characteristics of the original set of news documents. Research shows that the text representation is one of the keys for MDNS techniques. Without doubt, the bag-of-words (BOW) methods are most widely used. However, BOW methods generate high-dimensional representation vectors which ask for large storage and high computational complexity for MDNS. Besides, the generated representation vectors by BOW lack the semantic information and temporal information of the words, which limits the performance of MDNS. To tackle above issues, this paper introduces a word/paragraph embedding method via neural network modelling to generate lower dimensional word/paragraph representation vectors retaining word order and context information and semantic relationships between words/paragraphs. Besides, for MDNS, relevance and redundancy are both critical issues. Unlike the traditional MDNS methods quantifying the relevance among different sentences followed with a greedy post-processing module to ensure the diversity of summary, in this study, we concurrently take relevance, diversity and length constraint into account by employing density peak clustering (DPC) technique and the integrated sentence scoring method to select the more representative sentences and generate the summary with less redundancy. Experimental results on the DUC2003 and DUC2004 datasets demonstrate the effectiveness of our MDNS method, compared to the state-of-the-art methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call