Abstract

We address the problem of finding similar historical questions that are semantically equivalent or relevant to an input query question in community question-answering (CQA) sites. One of the main challenges for this task is that questions are usually too long and often contain peripheral information in addition to the main goals of the question. To address this problem, we propose an end-to-end Hierarchical Compare Aggregate (HCA) model that can handle this problem without using any task-specific features. We first split questions into sentences and compare every sentence pair of the two questions using a proposed Word-Level-Compare-Aggregate model called WLCA-model and then the comparison results are aggregated with a proposed Sentence-Level-Compare-Aggregate model to make the final decision. To handle the insufficient training data problem, we propose a sequential transfer learning approach to pre-train the WLCA-model on a large paraphrase detection dataset. Our experiments on two editions of the Semeval benchmark datasets and the domain-specific AskUbuntu dataset show that our model outperforms the state-of-the-art models.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.