Learning for Text Summarization Using Labeled and Unlabeled Sentences

Massih-Reza Amini,Patrick Gallinari

doi:10.1007/3-540-44668-0_164

Abstract

We describe an original machine learning approach for automatic text summarization; it works by extracting the most relevant sentences from a document. Since labeled corpora are difficult to collect for this task, we propose a semi-supervised method, which makes use of a small set of labeled sentences together with a large set of unlabeled documents, for improving the performances of summary systems. We show that this method is an instance of the Classification EM algorithm in the case of gaussian densities, and that it can also be used in a non-parametric setting. We finally provide an empirical evaluation on the Reuters news-wire corpus.KeywordsUnlabeled DataBaseline SystemInitial PartitionClassification Maximum LikelihoodText SummarizationThese keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Full Text