Abstract

Recent studies on extractive text summarization formulate it as a combinatorial optimization problem, extracting the optimal subset from a set of the textual units that maximizes an objective function without violating the length constraint. Although these methods successfully improve automatic evaluation scores, they do not consider the discourse structure in the source document. Thus, summaries generated by these methods may lack logical coherence. In previous work, we proposed a method that exploits a discourse tree structure to produce coherent summaries. By transforming a traditional discourse tree, namely a rhetorical structure theory-based discourse tree (RST-DT), into a dependency-based discourse tree (DEP-DT), we formulated the summarization procedure as a Tree Knapsack Problem whose tree corresponds to the DEP-DT. This paper extends the work with a detailed discussion of the approach together with a novel efficient dynamic programming algorithm for solving the Tree Knapsack Problem. Experiments show that our method not only achieved the highest score in both automatic and human evaluation, but also obtained good performance in terms of the linguistic qualities of the summaries.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.