Abstract

Question classification is a crucial task for answer selection. Question classification could help define the structure of question sentences generated by features extraction from a sentence, such as who, when, where, and how. In this paper, we proposed a methodology to improve question classification from texts by using feature selection and word embedding techniques. We conducted several experiments to evaluate the performance of the proposed methodology using two different datasets (TREC-6 dataset and Thai sentence dataset) with term frequency and combined term frequency-inverse document frequency including Unigram, Unigram+Bigram, and Unigram + Trigram as features. Machine learning models based on traditional and deep learning classifiers were used. The traditional classification models were Multinomial Naïve Bayes, Logistic Regression, and Support Vector Machine. The deep learning techniques were Bidirectional Long Short-Term Memory (BiLSTM), Convolutional Neural Networks (CNN), and Hybrid model, which combined CNN and BiLSTM model. The experiment results showed that our methodology based on Part-of-Speech (POS) tagging was the best to improve question classification accuracy. The classifying question categories achieved with average micro F1-score of 0.98 when applied SVM model on adding all POS tags in the TREC-6 dataset. The highest average micro F1-score achieved 0.8 when applied GloVe by using CNN model on adding focusing tags in the Thai sentences dataset.

Highlights

  • In recent years, we have required large amounts of information to retrieve the answer via the question answering applications

  • The results showed that Support Vector Machine (SVM), adding focusing POS tag (N+V+Adj+Det) input, achieved the highest average micro F1-score of 0.7654 and macro F1-score of 0.7685

  • We evaluate the performance of question classification by comparing an F1-score group by question classes between difference input from purposed our data preprocessing tasks on the TREC-6 dataset and Thai sentences dataset

Read more

Summary

Introduction

We have required large amounts of information to retrieve the answer via the question answering applications. Questioning is the key to gaining more information and is very useful in many applications. We use the questioning ability to ask for information or seeking answers. While readers seeking an answer will need to deal much more deeply with the problem of extracting the meaning of a text in a rich sense. Readers always seek to find an answer based on the type of question encountered. Because question and corresponding answers are related depending on question types, the readers can answer the question based on a keyword. Using words with the same meaning in the question is complicated to train a text model to understand language like humans

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call