Cross-language question retrieval with multi-layer representation and layer-wise adversary

Bo Li,Xiaodong Du,Meng Chen

doi:10.1016/j.ins.2020.01.035

Abstract

In cross-language question retrieval (CLQR), users employ a new question in one language to search the community question answering (CQA) archives for similar questions in another language. In addition to the ranking problem in monolingual question retrieval, one needs to bridge the language gap in CLQR. The existing adversarial models for cross-language learning normally rely on a single adversarial component. Since natural languages consist of units of different abstract levels, we argue that crossing the language gap adaptatively on different levels with multiple adversarial components should lead to smoother text representation and better CLQR performance. To this end, we first encode questions into multi-layer representations of different abstract levels with a CNN based model which enhances conventional models with diverse kernel shapes and the corresponding pooling strategy so as to capture different aspects of a text segment. We then impose a set of adversarial components on different layers of question representation so as to decide the appropriate abstract levels and their role in performing cross-language mapping. Experimental results on two real-world datasets demonstrate that our model outperforms state-of-the-art models for CLQR, which is on par with the strong machine translation baselines and most monolingual baselines.

Full Text