Abstract

Supervised approaches can learn a spoken document summarizer generating high-quality summaries using a set of training examples matched to the domain of target documents. However, preparing a sufficient number of in-domain training examples is expensive. In this paper we propose an approach for unsupervised domain adaptation for spoken document summarization, so no in-domain training examples are needed. A summarizer is first learned from a set of out-of-domain training examples by a supervised summarization approach based on structured support vector machine, and this summarizer is used to generate a set of initial summaries for the target spoken documents. The target documents and their initial machine-generated summaries then serve as extra training examples for learning a new summarizer, which further updates the summaries of the target spoken documents. This process is continued iteratively to incrementally improve the summarizer for the target spoken documents. Moreover, extra approaches transforming the feature representations based on the data distribution in the target domain and augmenting the representations with an extra set of domain-specific features are also proposed. Encouraging results were obtained in summarizing Mandarin-English code-switching course lectures using training examples from Mandarin broadcast news.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call