Learning the Multilingual Translation Representations for Question Retrieval in Community Question Answering via Non-Negative Matrix Factorization

Guangyou Zhou,Jun Zhao,Xiaohua Tony Hu,Tingting He,Zhiwen Xie

doi:10.1109/taslp.2016.2544661

Abstract

Community question answering (CQA) has become an increasingly popular research topic. In this paper, we focus on the problem of question retrieval. Question retrieval in CQA can automatically find the most relevant and recent questions that have been solved by other users. However, the word ambiguity and word mismatch problems bring about new challenges for question retrieval in CQA. State-of-the-art approaches address these issues by implicitly expanding the queried questions with additional words or phrases using monolingual translation models. While useful, the effectiveness of these models is highly dependent on the availability of quality parallel monolingual corpora (e.g., question--answer pairs) in the absence of which they are troubled by noise issues. In this work, we propose an alternative way to address the word ambiguity and word mismatch problems by taking advantage of potentially rich semantic information drawn from other languages. Our proposed method employs statistical machine translation to improve question retrieval and enriches the question representation with the translated words from other languages via non-negative matrix factorization. Experiments conducted on real CQA data sets show that our proposed approach is promising.

Full Text