Query-based summarization using MDL principle

Marina Litvak,Natalia Vanetik

doi:10.18653/v1/w17-1004

Abstract

Query-based text summarization is aimed at extracting essential information that answers the query from original text. The answer is presented in a minimal, often predefined, number of words. In this paper we introduce a new unsupervised approach for query-based extractive summarization, based on the minimum description length (MDL) principle that employs Krimp compression algorithm (Vreeken et al., 2011). The key idea of our approach is to select frequent word sets related to a given query that compress document sentences better and therefore describe the document better. A summary is extracted by selecting sentences that best cover query-related frequent word sets. The approach is evaluated based on the DUC 2005 and DUC 2006 datasets which are specifically designed for query-based summarization (DUC, 2005 2006). It competes with the best results.

Highlights

Query-based summarization (QS) is directed toward generating a summary most relevant to a given query
In (Nguyen et al, 2015) the problem of micro-review summarization is formulated within the minimum description length (MDL) framework, where the authors view the tips as being encoded by snippets, and seek to find a collection of snippets that produces the encoding with the minimum number of bits
NUS method uses two features: sentence semantic similarity and redundancy minimization based on Maximal Marginal Relevance (MMR)

Summary

Introduction

Query-based summarization (QS) is directed toward generating a summary most relevant to a given query. Our approach for QS is based on the MDL principle, defining the best summary as the one that leads to the best compression of the text with query-related information by providing its shortest and most concise description. The MDL principle is widely useful in compression techniques of non-textual data, such as summarization of query results for online analytical processing (OLAP) applications (Lakshmanan et al, 2002; Bu et al, 2005). Only a few works about text summarization using MDL can be found in the literature. Nomoto and Matsumoto (2001) used K-means clustering extended with the MDL principle, to find diverse topics in the summarized text. Nomoto (2004) extended the C4.5 classifier with MDL for learning rhetorical relations. In (Nguyen et al, 2015) the problem of micro-review summarization is formulated within the MDL framework, where the authors view the tips as being encoded by snippets, and seek to find a collection of snippets that produces the encoding with the minimum number of bits

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Query-based summarization using MDL principle

Abstract

Highlights

Summary

Talk to us

Similar Papers

Lead the way for us

Publication Date: Jan 1, 2017
Citations: 37	License type: cc-by

Similar Papers

The Minimum Description Length Principle
Peter D Grünwald
-
Peter D GrünwaldPeter D Grünwald
23 Mar 2007
23 Mar 2007

Enhanced minimum description length preprocessing of time series trajectories
Gajanan Gawde ... Jyoti Pawar
-
Gajanan Gawde, et. al.Gajanan Gawde ... Jyoti Pawar
01 Mar 2017
01 Mar 2017

An analysis of the difference of code lengths between two-step codes based on MDL principle and Bayes codes
M Goto ... S Hirasawa
IEEE Transactions on Information Theory | VOL. 47
M Goto, et. al.M Goto ... S Hirasawa
01 Mar 2001
IEEE Transactions on Information Theory | VOL. 47

New paradigm of learnable computer vision algorithms based on the representational MDL principle
Alexey S Potapov ... Anton N Averkin
-
Alexey S Potapov, et. al.Alexey S Potapov ... Anton N Averkin
23 Apr 2010
23 Apr 2010

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Query-based summarization using MDL principle

Abstract

Highlights

Summary

Talk to us

Similar Papers