Improving supervised learning for meeting summarization using sampling and regression

Shasha Xie,Yang Liu

doi:10.1016/j.csl.2009.04.007

Abstract

Meeting summarization provides a concise and informative summary for the lengthy meetings and is an effective tool for efficient information access. In this paper, we focus on extractive summarization, where salient sentences are selected from the meeting transcripts to form a summary. We adopt a supervised learning approach for this task and use a classifier to determine whether to select a sentence in the summary based on a rich set of features. We address two important problems associated with this supervised classification approach. First we propose different sampling methods to deal with the imbalanced data problem for this task where the summary sentences are the minority class. Second, in order to account for human disagreement for summary annotation, we reframe the extractive summarization task using a regression scheme instead of binary classification. We evaluate our approaches using the ICSI meeting corpus on both the human transcripts and speech recognition output, and show performance improvement using different sampling methods and regression model.

Full Text