Abstract
Query-based text summarization is aimed at extracting essential information that answers the query from original text. The answer is presented in a minimal, often predefined, number of words. In this paper we introduce a new unsupervised approach for query-based extractive summarization, based on the minimum description length (MDL) principle that employs Krimp compression algorithm (Vreeken et al., 2011). The key idea of our approach is to select frequent word sets related to a given query that compress document sentences better and therefore describe the document better. A summary is extracted by selecting sentences that best cover query-related frequent word sets. The approach is evaluated based on the DUC 2005 and DUC 2006 datasets which are specifically designed for query-based summarization (DUC, 2005 2006). It competes with the best results.
Highlights
Query-based summarization (QS) is directed toward generating a summary most relevant to a given query
In (Nguyen et al, 2015) the problem of micro-review summarization is formulated within the minimum description length (MDL) framework, where the authors view the tips as being encoded by snippets, and seek to find a collection of snippets that produces the encoding with the minimum number of bits
NUS method uses two features: sentence semantic similarity and redundancy minimization based on Maximal Marginal Relevance (MMR)
Summary
Query-based summarization (QS) is directed toward generating a summary most relevant to a given query. Our approach for QS is based on the MDL principle, defining the best summary as the one that leads to the best compression of the text with query-related information by providing its shortest and most concise description. The MDL principle is widely useful in compression techniques of non-textual data, such as summarization of query results for online analytical processing (OLAP) applications (Lakshmanan et al, 2002; Bu et al, 2005). Only a few works about text summarization using MDL can be found in the literature. Nomoto and Matsumoto (2001) used K-means clustering extended with the MDL principle, to find diverse topics in the summarized text. Nomoto (2004) extended the C4.5 classifier with MDL for learning rhetorical relations. In (Nguyen et al, 2015) the problem of micro-review summarization is formulated within the MDL framework, where the authors view the tips as being encoded by snippets, and seek to find a collection of snippets that produces the encoding with the minimum number of bits
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.