Abstract
Automatically determining similar questions and ranking the obtained questions according to their similarities to each input question is a very important task to any community Question Answering system (cQA). Various methods have applied for this task including conventional machine learning methods with feature extraction and some recent studies using deep learning methods. This paper addresses the problem of how to combine advantages of different methods into one unified model. Moreover, deep learning models are usually only effective for large data, while training data sets in cQA problems are often small, so the idea of integrating external knowledge into deep learning models for this cQA problem becomes more important. To this objective, we propose a neural network-based model which combines a Convolutional Neural Network (CNN) with features from other methods so that the deep learning model is enhanced with addtional knowledge sources. In our proposed model, the CNN component will learn the representation of two given questions, then combined additional features through a Multilayer Perceptron (MLP) to measure similarity between the two questions. We tested our proposed model on the SemEval 2016 task-3 data set and obtain better results in comparison with previous studies on the same task.
Highlights
Nowadays, many community Question Answering system (cQA) forums are becoming more and more popular and really useful such as StackOverflow1 and Quora2
It is a natural way that whenever a cQA system receives a question, it firstly determine whether similar questions have existed or not, and if yes the system prefers to show these related questionanswers contained in its database before waiting for new answers from other users
The main parts of this paper include: section III presents the Convolutional Neural Network (CNN) model for question representation and for measuring similarity between two questions; Section IV presents different external knowledge sources and how to gain them; Section V is the important part in which we show how to integrate the external knowledge features into the CNN model
Summary
In this paper, we address the problem to utilize different methods and different information sources for improving the accuracy of measuring question similarity as well as ranking the similar questions with respect to an input question To this objective, we firstly based on CNN, a very successful deep learning model, to formulate the problem of measuring the similarity between two questions. Various kinds of additional information have been used including word2vec representation which represents a word as a vector of real numbers; linguistic features such as words and name entities; question types and question categories, which are obtained by classification. From the CNN component we generate the joint representation containing miscellaneous features In another way, we can imagine that this model is an effective way of enhancing a deep learning model by providing complimentary additional knowledge, especially in the case of lacking training data.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: International Journal of Machine Learning and Computing
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.