Summarization- and learning-based approaches to information distillation

Boriska Toth,Dilek Hakkani-Tur,Sibel Yaman

doi:10.1109/icassp.2010.5494971

Abstract

Information distillation is the task that aims to extract relevant passages of text from massive volumes of textual and audio sources, given a query. In this paper, we investigate two perspectives that use shallow language processing for answering open-ended distillation queries, such as “List me facts about [event]”. The first approach is a summarization-based approach that uses the unsupervised maximum marginal relevance (MMR) technique to successfully capture relevant but not redundant information. The second approach is based on supervised classification and trains support vector machines (SVMs) to discriminate relevant snippets from irrelevant snippets using a variety of features. Furthermore, we investigate the merit of using the ROUGE metric for its ability to evaluate redundancy alongside the conventionally used F-measure for evaluating distillation systems. Our experimental results with textual data indicate that SVM and MMR perform similarly in terms of ROUGE-2 scores while SVM is better than MMR in terms of F1 measure. Moreover, when speech recognizer output is used, SVM outperforms MMR in terms of both scores.

Full Text