Ad-hoc Information Retrieval Research Articles

Extractive speech summarization, which purports to select an indicative set of sentences from a spoken document so as to succinctly represent the most important aspects of the document, has garnered much research over the years. In this paper, we cast extractive speech summarization as an ad-hoc information retrieval (IR) problem and investigate various language modeling (LM) methods for important sentence selection. The main contributions of this paper are four-fold. First, we explore a novel sentence modeling paradigm built on top of the notion of relevance, where the relationship between a candidate summary sentence and a spoken document to be summarized is discovered through different granularities of context for relevance modeling. Second, not only lexical but also topical cues inherent in the spoken document are exploited for sentence modeling. Third, we propose a novel clarity measure for use in important sentence selection, which can help quantify the thematic specificity of each individual sentence that is deemed to be a crucial indicator orthogonal to the relevance measure provided by the LM-based methods. Fourth, in an attempt to lessen summarization performance degradation caused by imperfect speech recognition, we investigate making use of different levels of index features for LM-based sentence modeling, including words, subword-level units, and their combination. Experiments on broadcast news summarization seem to demonstrate the performance merits of our methods when compared to several existing well-developed and/or state-of-the-art methods.

Read full abstract

Statistical language modeling has been successfully developed for speech recognition and information retrieval. The minimum classification error (MCE) training was undertaken to enhance speech recognition performance by minimizing the word error rate. This paper presents a new minimum rank error (MRE) algorithm for n-gram language model training. Rather than speech recognition, the proposed language models are estimated for information retrieval by considering the metric of average precision. However, the maximization of average precision is closely linked to minimizing the rank error or optimizing the order of the ranked documents. Accordingly, this paper calculates the rank error loss function from the misordering pairs of relevant and irrelevant documents in the rank list. The Bayes risk due to the expected rank loss is minimized to develop the Bayesian retrieval rule for ad-hoc information retrieval. Consequently, the discriminative training of language model is performed by integrating discrimination information from individual relevant documents relative to their corresponding irrelevant documents. Experimental results on TREC collections indicate that the proposed MRE language model improves the order of relevant documents, and degrades that of irrelevant documents. The MRE method achieves significantly higher average precision for test queries than the maximum likelihood and the MCE retrieval models.

Read full abstract

Ad-hoc Information Retrieval Research Articles

Related Topics

Articles published on Ad-hoc Information Retrieval

Information retrieval algorithms and neural ranking models to detect previously fact-checked information

Document Representation and Query Expansion Models for Blog Recommendation

An analysis of evaluation campaigns in ad-hoc medical information retrieval: CLEF eHealth 2013 and 2014

Empirical Evaluation of Social and Traditional Search Tools for Adhoc Information Retrieval

Combining Relevance Language Modeling and Clarity Measure for Extractive Speech Summarization

Minimum Rank Error Language Modeling

CLOSING THE VOCABULARY GAP FOR COMPUTING TEXT SIMILARITY AND INFORMATION RETRIEVAL

Improving English and Chinese Ad-Hoc Retrieval: A Tipster Text Phase 3 Project Report

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Ad-hoc Information Retrieval Research Articles

Related Topics

Articles published on Ad-hoc Information Retrieval

Information retrieval algorithms and neural ranking models to detect previously fact-checked information

Document Representation and Query Expansion Models for Blog Recommendation

An analysis of evaluation campaigns in ad-hoc medical information retrieval: CLEF eHealth 2013 and 2014

Empirical Evaluation of Social and Traditional Search Tools for Adhoc Information Retrieval

Combining Relevance Language Modeling and Clarity Measure for Extractive Speech Summarization

Minimum Rank Error Language Modeling

CLOSING THE VOCABULARY GAP FOR COMPUTING TEXT SIMILARITY AND INFORMATION RETRIEVAL

Improving English and Chinese Ad-Hoc Retrieval: A Tipster Text Phase 3 Project Report