Abstract

Supervised training of neural models to duplicate question detection in community Question Answering (CQA) requires large amounts of labeled question pairs, which can be costly to obtain. To minimize this cost, recent works thus often used alternative methods, e.g., adversarial domain adaptation. In this work, we propose two novel methods—weak supervision using the title and body of a question, and the automatic generation of duplicate questions—and show that both can achieve improved performances even though they do not require any labeled data. We provide a comparison of popular training strategies and show that our proposed approaches are more effective in many cases because they can utilize larger amounts of data from the CQA forums. Finally, we show that weak supervision with question title and body information is also an effective method to train CQA answer selection models without direct answer supervision.

Highlights

  • The automatic detection of question duplicates in community Question Answering forums is an important task that can help users to more effectively find existing questions and answers (Nakov et al, 2017; Cao et al, 2012; Xue et al, 2008; Jeon et al, 2005), and to avoid posting similar questions multiple times

  • We evaluate common question retrieval and duplicate detection models such as RCNN (Lei et al, 2016) and BiLSTM and compare a wide range of training methods: duplicate question generation (DQG), WSTB, supervised training, adversarial domain transfer, weak supervision with question-answer pairs, and unsupervised training

  • They show that the question generation model for DQG can be successfully transferred across similar domains with only minor effects on the performances

Read more

Summary

Introduction

The automatic detection of question duplicates in community Question Answering (cQA) forums is an important task that can help users to more effectively find existing questions and answers (Nakov et al, 2017; Cao et al, 2012; Xue et al, 2008; Jeon et al, 2005), and to avoid posting similar questions multiple times. A large number of cQA forums do not contain enough labeled data for supervised training of neural models.. Recent works have used alternative training methods This includes weak supervision with question-answer pairs (Qiu and Huang, 2015), semi-supervised training (Uva et al, 2018), and adversarial domain transfer (Shah et al, 2018). An important limitation of these methods is that they still rely on substantial amounts of labeled data— either thousands of duplicate questions (e.g., from a similar source domain in the case of domain transfer) or large numbers of question-answer pairs. To train effective duplicate question detection models for the large number of cQA forums without labeled duplicates we need other methods that do not require any annotations while performing on-par with supervised in-domain training

Objectives
Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.