Abstract

Tree trimming is the problem of extracting an optimal subtree from an input tree, and sentence extraction and sentence compression methods can be formulated and solved as tree trimming problems. Previous approaches require integer linear programming (ILP) solvers to obtain exact solutions. The problem of this approach is that ILP solvers are black-boxes and have no theoretical guarantee as to their computation complexity. We propose a dynamic programming (DP) algorithm for tree trimming problems whose running time is O(NLlogN), where N is the number of tree nodes andL is the length limit. Our algorithm exploits the zero-suppressed binary decision diagram (ZDD), a data structure that represents a family of sets as a directed acyclic graph, to represent the set of subtrees in a compact form; the structure of ZDD permits the application of DP to obtain exact solutions, and our algorithm is applicable to different tree trimming problems. Moreover, experiments show that our algorithm is faster than state-of-the-art ILP solvers, and that it scales well to handle large summarization problems.

Highlights

  • Extractive text summarization and sentence compression are tasks that basically select a subset of the input set of textual units that is appropriate as a summary or a compressed sentence

  • We have proposed a dynamic programming (DP) algorithm for the tree trimming problems that appear in text summarization

  • Our approach always finds an optimal solution, and it runs in O(N L log N ) time, where N is the number of tree nodes and L is the length limit

Read more

Summary

Introduction

Extractive text summarization and sentence compression are tasks that basically select a subset of the input set of textual units that is appropriate as a summary or a compressed sentence. The problem of finding an optimal subtree of an input tree, is one kind of these combinatorial optimization problems, and it is used in three classes of text summarizations: sentence compression (Filippova and Strube, 2008; Filippova and Altun, 2013), single-document summarization (Hirao et al, 2013), and the combination of sentence compression and single-document summarization (Kikuchi et al, 2014). In these tasks, the set of input textual units is represented as a rooted tree whose nodes correspond to the minimum textual units such as sentences and words. Since the optimal trimmed subtree preserves the relationships between textual units, it is a concise representation of the original set that preserves linguistic quality

Methods
Findings
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call