Abstract

Abstract The probability of redundancy in questions has significantly increased due to the increasing influx of users on different cQA forums such as Quora, Stack overflow, etc. Because of this redundancy, the responses are scattered through various variations of the same question that results in unsatisfactory search results to a specific question. To address this issue, this work proposes the model for discovering the semantic similarity among the cQA questions. We followed two approaches (i) Feature-based: the question embedding is created using four forms of word embeddings and an ensemble of all four. Then Siamese LSTM (sLSTM) is used to find the semantic similarity among the questions. (ii) Fine-tuning: we fine-tuned BERT model on STS and SNLI data, which employs Siamese network architectures to generate semantically meaningful sentence embeddings. Then sBERT is used to assess the similarity between the questions. Experiments were carried out on Quora (QQP) and Stack Exchange cQA dataset with training sets of different sizes and word vectors of different dimensionalities. The model shows significant improvement over the state-of-the-artwork on sentence similarity tasks.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.