Abstract

SummaryHow to generate summary with more novel and rich semantics is a challenging issue in the area of multi‐document automatic summary. In this paper, a core semantics extraction model (CSEM) is proposed to improve the novel and rich semantics of multi‐document summary. Firstly, for improving the rich semantics, semantic units, which are a group of association relations of keywords, are used to express texts' semantics. Secondly, for improving the novel semantics, an attenuation function is introduced to adjust the importance of semantic units according to the appearing times that semantic units in the candidate of summary sentences. Thirdly, in order to maximize the novel and rich semantics of summary, the generating process of summary is converted into the optimization process on how to find a set of sentences with a higher importance. Finally, CSEM extracts the least number of sentences to cover the most core semantics in corpus as summary. Experimental results on the benchmark DUC 2004 show that our model outperforms the state‐of‐art approaches (eg, OCCAMS_V, JS‐Gen‐2) under official metric. Especially, the recall of our model in ROUGE‐1 is 40.684%, which is better than other approaches (eg, OCCAMS_V 38.497% and JS‐Gen‐2 36.739%).

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call