Improving Question Classification by Feature Extraction and Selection

Nguyen Van-Tu,Le Anh-Cuong

doi:10.17485/ijst/2016/v9i17/93160

Abstract

Question classification is the task of predicting the entity type of the answering sentence for a given question in natural language. It plays an important role in finding or constructing accurate answers and therefore helps to improve quality of automated question answering systems. Different lexical, syntactical and semantic features was extracted automatically from a question to serve the classification in previous studies. However, combining all those features doesn't always give the best results for all types of questions. Different from previous studies, this paper focuses on the problem of how to extract and select efficient features adapting to each different types of question. We first propose a method of using a feature selection algorithm to determine appropriate features corresponding to different question types. Secondly, we design a new type of features, which is based on question patterns. We tested our proposed approach on the benchmark dataset TREC and using Support Vector Machines (SVM) for the classification algorithm. The experiment shows obtained results with the accuracies of 95.2% and 91.6% for coarse grain and fine grain data sets respectively, which are much better in comparison with the previous studies.

Full Text