Question classification in Persian using word vectors and frequencies

Mohammad Razzaghnoori,Hedieh Sajedi,Iman Khani Jazani

doi:10.1016/j.cogsys.2017.07.002

Mohammad Razzaghnoori, Hedieh Sajedi + Show 1 more

Open Access

https://doi.org/10.1016/j.cogsys.2017.07.002

Copy DOI

Abstract

The necessity of the existence of Question Answering (QA) systems becomes evident by considering the fact that the enormous amount of unstructured data created by humans nowadays, results in ineffectiveness of search engines to provide the exact solution for a given question. However, an outstanding question answering system requires an outstanding Question Classification (QC) system. Question classifier is a system that assigns a label to each question. There exist different ways of solving this problem such as rule-based, machine learning, and hybrid approaches. This paper provides a better solution for QC using machine-learning approaches. Three methods of feature extraction are proposed in this paper. The First method uses clustering algorithms to partition vocabulary into clusters and acquires feature vector corresponding to each question using clustering information. The second one suggests a method of extracting features from questions to dispose of using recurrent neural networks and to use feedforward neural networks, which have the advantage of learning faster and less need for data, instead. Each question is converted to a feature vector, which is obtained by the Word2vec method and weighted by tf-idf coefficients. The results of question classification using Support Vector Machine and Neural Network classifiers indicate the effectiveness of this type of feature vector and based on that, high performance of the proposed QC system. Finally, the third approach keeps the innovation behind first approach, but it also keeps the fact that we are dealing with a sequence based type of data into consideration. Eventually, it would be concluded that even with a limited amount of data it is reasonable to take Recurrent Neural Networks into consideration.

Full Text