Bridging the vocabulary gap between questions and answer sentences

Saeedeh Momtazi,Dietrich Klakow

doi:10.1016/j.ipm.2015.04.005

Abstract

We propose two novel language models to improve the performance of sentence retrieval in Question Answering (QA): class-based language model and trained trigger language model. As the search in sentence retrieval is conducted over smaller segments of text than in document retrieval, the problems of data sparsity and exact matching become more critical. Different techniques such as the translation model are also proposed to overcome the word mismatch problem. Our class-based and trained trigger language models, however, use different approaches to this aim and are shown to outperform the exiting models. The class model uses word clustering algorithm to capture term relationships. In this model, we assume a relation between the terms that belong to the same clusters; as a result, they can be substituted when searching for relevant sentences. The trigger model captures pairs of trigger and target words while training on a large corpus. The model considers a relation between a question and a sentence, if a trigger word appears in the question and the sentence contains the corresponding target word. For both proposed models, we introduce different notions of co-occurrence to find word relations. In addition, we study the impact of corpus size and domain on the models. Our experiments on TREC QA collection verify that the proposed model significantly improves the sentence retrieval performance compared to the state-of-the-art translation model. While the translation model based on mutual information (Karimzadehgan and Zhai, 2010) has 0.3927 Mean Average Precision (MAP), the class model achieves 0.4174 MAP and the trigger model enhances the performance to 0.4381.

Full Text