Abstract

In a community question answering (CQA) system, the new questions are appeared endlessly which have no tags. And the questions must be marked as some labels. Therefore, the question classification is very important for CQA. In the traditional task of question classification, a mass of labeled questions are required. In the real world, it is effortless to obtain a large number of unlabeled question samples and the vast labeled question samples are fairly expensive to obtain. Therefore, how to utilize the unlabeled samples to improve the question classification accuracy has been the core question of the question classification. In this paper, a kind of semi-supervised question classification method based on ensemble learning is proposed. Firstly, several classifiers are combined as one, i.e. ensemble classifier. The ensemble classifier is trained firstly to utilize a small number of labeled question samples. Secondly, the trained preliminary classifier gives each of the unlabeled question samples a pseudo label. Then, the ensemble classifier is trained again to use the labeled question samples and a large number of unlabeled question samples which have pseudo labels. Finally, to verify the effectiveness of the method through the experiments on question samples of 15 classes extracted from the community question answering system. The experiments demonstrate that the method could effectively utilize a large number of unlabeled question samples to improve the question classification accuracy.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call