Abstract

Automatically determining similar questions and ranking the obtained questions according to their similarities to each input question is a very important task to any community Question Answering system (cQA). Various methods have applied for this task including conventional machine learning methods with feature extraction and some recent studies using deep learning methods. This paper addresses the problem of how to combine advantages of different methods into one unified model. Moreover, deep learning models are usually only effective for large data, while training data sets in cQA problems are often small, so the idea of integrating external knowledge into deep learning models for this cQA problem becomes more important. To this objective, we propose a neural network-based model which combines a Convolutional Neural Network (CNN) with features from other methods so that the deep learning model is enhanced with addtional knowledge sources. In our proposed model, the CNN component will learn the representation of two given questions, then combined additional features through a Multilayer Perceptron (MLP) to measure similarity between the two questions. We tested our proposed model on the SemEval 2016 task-3 data set and obtain better results in comparison with previous studies on the same task.

Highlights

  • Nowadays, many community Question Answering system (cQA) forums are becoming more and more popular and really useful such as StackOverflow1 and Quora2

  • It is a natural way that whenever a cQA system receives a question, it firstly determine whether similar questions have existed or not, and if yes the system prefers to show these related questionanswers contained in its database before waiting for new answers from other users

  • The main parts of this paper include: section III presents the Convolutional Neural Network (CNN) model for question representation and for measuring similarity between two questions; Section IV presents different external knowledge sources and how to gain them; Section V is the important part in which we show how to integrate the external knowledge features into the CNN model

Read more

Summary

INTRODUCTION

In this paper, we address the problem to utilize different methods and different information sources for improving the accuracy of measuring question similarity as well as ranking the similar questions with respect to an input question To this objective, we firstly based on CNN, a very successful deep learning model, to formulate the problem of measuring the similarity between two questions. Various kinds of additional information have been used including word2vec representation which represents a word as a vector of real numbers; linguistic features such as words and name entities; question types and question categories, which are obtained by classification. From the CNN component we generate the joint representation containing miscellaneous features In another way, we can imagine that this model is an effective way of enhancing a deep learning model by providing complimentary additional knowledge, especially in the case of lacking training data.

RELATED WORK
MODELING CNN FOR QUESTION SIMILARITY MEASUREMENT
EXTERNAL KNOWLEDGE
Conventional Features
Question Type
Word Embedding
Question Category
THE EXTENDED CNN MODEL
Dataset
Setup Model’s Experimental Configures
Results
CONCLUSION
Findings
CONFLICT OF INTEREST
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call