Semi-supervised Question Classification Based on Ensemble Learning

Yiyang Li,Lei Su,Jun Chen,Liwei Yuan

doi:10.1007/978-3-319-20472-7_37

Abstract

In the traditional task of question classification, a mass of labeled questions are required. However, it’s very hard to obtain many labeled questions in the real world. Meanwhile, it is very easy to obtain vast unlabeled question samples. Therefore, how to utilize these unlabeled samples to improve the question classification accuracy has been the core question of the question classification. In this paper, a semi-supervised question classification method based on ensemble learning, semi-Bagging, is proposed. The method utilizes a handful of labeled question samples to train the classifier. And then the classifier use a large number of unlabeled question samples which have pseudo labels to train again. Finally, during the experiments on question samples of 15 classes extracted from the community question answering system, the method could effectively utilize a large number of unlabeled question samples and a few of labeled question samples to improve the question classification accuracy.

Full Text