Neural Duplicate Question Detection without Labeled Training Data

Andreas Rücklé,Nafise Sadat Moosavi,Iryna Gurevych

doi:10.18653/v1/d19-1171

Abstract

Supervised training of neural models to duplicate question detection in community Question Answering (CQA) requires large amounts of labeled question pairs, which can be costly to obtain. To minimize this cost, recent works thus often used alternative methods, e.g., adversarial domain adaptation. In this work, we propose two novel methods—weak supervision using the title and body of a question, and the automatic generation of duplicate questions—and show that both can achieve improved performances even though they do not require any labeled data. We provide a comparison of popular training strategies and show that our proposed approaches are more effective in many cases because they can utilize larger amounts of data from the CQA forums. Finally, we show that weak supervision with question title and body information is also an effective method to train CQA answer selection models without direct answer supervision.

Highlights

The automatic detection of question duplicates in community Question Answering forums is an important task that can help users to more effectively find existing questions and answers (Nakov et al, 2017; Cao et al, 2012; Xue et al, 2008; Jeon et al, 2005), and to avoid posting similar questions multiple times
We evaluate common question retrieval and duplicate detection models such as RCNN (Lei et al, 2016) and BiLSTM and compare a wide range of training methods: duplicate question generation (DQG), WSTB, supervised training, adversarial domain transfer, weak supervision with question-answer pairs, and unsupervised training
They show that the question generation model for DQG can be successfully transferred across similar domains with only minor effects on the performances

Summary

Introduction

The automatic detection of question duplicates in community Question Answering (cQA) forums is an important task that can help users to more effectively find existing questions and answers (Nakov et al, 2017; Cao et al, 2012; Xue et al, 2008; Jeon et al, 2005), and to avoid posting similar questions multiple times. A large number of cQA forums do not contain enough labeled data for supervised training of neural models.. Recent works have used alternative training methods This includes weak supervision with question-answer pairs (Qiu and Huang, 2015), semi-supervised training (Uva et al, 2018), and adversarial domain transfer (Shah et al, 2018). An important limitation of these methods is that they still rely on substantial amounts of labeled data— either thousands of duplicate questions (e.g., from a similar source domain in the case of domain transfer) or large numbers of question-answer pairs. To train effective duplicate question detection models for the large number of cQA forums without labeled duplicates we need other methods that do not require any annotations while performing on-par with supervised in-domain training

Objectives

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Neural Duplicate Question Detection without Labeled Training Data

Abstract

Highlights

Summary

Talk to us

Similar Papers

Lead the way for us

Publication Date: Jan 1, 2019
Citations: 36	License type: cc-by

Similar Papers

Duplicate Question Detection with Deep Learning Using Word2Vec
P Lavanya Kumari ... P Ram Teja
-
P Lavanya Kumari, et. al.P Lavanya Kumari ... P Ram Teja
02 Sep 2021
02 Sep 2021

DeepDup: Duplicate Question Detection in Community Question Answering
Mohomed Shazan Mohomed Jabbar ... Sankalp Prabharkar
-
Mohomed Shazan Mohomed Jabbar, et. al.Mohomed Shazan Mohomed Jabbar ... Sankalp Prabharkar
23 Jul 2021
23 Jul 2021

Bert-QAnet: BERT-encoded hierarchical question-answer cross-attention network for duplicate question detection
Xuan Zhao ... Jimmy Xiangji Huang
Neurocomputing | VOL. 509
Xuan Zhao, et. al.Xuan Zhao ... Jimmy Xiangji Huang
20 Aug 2022
Neurocomputing | VOL. 509

Feature engineering in learning-to-rank for community question answering task
Nafis Sajid ... Muhammad Ibrahim
International Journal of Computers and Applications | VOL. ahead-of-print
Nafis Sajid, et. al.Nafis Sajid ... Muhammad Ibrahim
07 Aug 2024
International Journal of Computers and Applications | VOL. ahead-of-print

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Neural Duplicate Question Detection without Labeled Training Data

Abstract

Highlights

Summary

Talk to us

Similar Papers