As a new generation of search engine, automatic question answering system (QAS) is becoming more and more important and has become one of the hotspots of computer application research and natural language processing (NLP). However, as an indispensable part of the QAS, the role of question classification is an understood thing in the system. In view of this, to further make the performance of question classification much better, both the feature extraction and the classification model were explored. On the study of existing CNN research, an improved CNN model based on Bagging integrated classification (“W2V + B-CNN” for short) is proposed and applied to question classification. Firstly, we combine the characteristics of short texts, use the Word2Vec tool to map the features of the words to a certain dimension, and organize the question sentences into the form of a two-dimensional matrix similar to the image. Then, the trained word vectors are used as the input of the CNN for feature extraction. Finally, the Bagging integrated classification algorithm is used to replace the Softmax classification of the traditional CNN for classification. In other words, the good of W2V + B-CNN model is that it can make use of the advantages of CNN and Bagging integrated classification at the same time. Overall, the new model can not only use the powerful feature extraction capabilities of CNN to extract the potential features of natural language questions but also use the good data classification capabilities of the integrated classification algorithm for feature classification at the same time, which can help improve the accuracy of the W2V + B-CNN in the application of question classification. The comparative experiment results prove that the effect of the W2V + B-CNN is significantly better than that of the CNN and other classification algorithms in question classification.
Read full abstract