Abstract

We address the problem of finding similar historical questions that are semantically equivalent or relevant to an input query question in community question-answering (CQA) sites. One of the main challenges for this task is that questions are usually too long and often contain peripheral information in addition to the main goals of the question. To address this problem, we propose an end-to-end Hierarchical Compare Aggregate (HCA) model that can handle this problem without using any task-specific features. We first split questions into sentences and compare every sentence pair of the two questions using a proposed Word-Level-Compare-Aggregate model called WLCA-model and then the comparison results are aggregated with a proposed Sentence-Level-Compare-Aggregate model to make the final decision. To handle the insufficient training data problem, we propose a sequential transfer learning approach to pre-train the WLCA-model on a large paraphrase detection dataset. Our experiments on two editions of the Semeval benchmark datasets and the domain-specific AskUbuntu dataset show that our model outperforms the state-of-the-art models.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call