Abstract
As a new generation of search engine, automatic question answering system (QAS) is becoming more and more important and has become one of the hotspots of computer application research and natural language processing (NLP). However, as an indispensable part of the QAS, the role of question classification is an understood thing in the system. In view of this, to further make the performance of question classification much better, both the feature extraction and the classification model were explored. On the study of existing CNN research, an improved CNN model based on Bagging integrated classification (“W2V + B-CNN” for short) is proposed and applied to question classification. Firstly, we combine the characteristics of short texts, use the Word2Vec tool to map the features of the words to a certain dimension, and organize the question sentences into the form of a two-dimensional matrix similar to the image. Then, the trained word vectors are used as the input of the CNN for feature extraction. Finally, the Bagging integrated classification algorithm is used to replace the Softmax classification of the traditional CNN for classification. In other words, the good of W2V + B-CNN model is that it can make use of the advantages of CNN and Bagging integrated classification at the same time. Overall, the new model can not only use the powerful feature extraction capabilities of CNN to extract the potential features of natural language questions but also use the good data classification capabilities of the integrated classification algorithm for feature classification at the same time, which can help improve the accuracy of the W2V + B-CNN in the application of question classification. The comparative experiment results prove that the effect of the W2V + B-CNN is significantly better than that of the CNN and other classification algorithms in question classification.
Highlights
In the Internet age, information has exploded
E results in Table 2 prove that, compared with the traditional feature extraction methods, the method based on the Word2Vec has a higher accuracy rate. e main reason is that the feature vectors obtained by training and learning with word vectors can overcome the problem of data feature sparseness in traditional feature training methods
Combining the characteristics of short texts, we took the Word2Vec tool to map the features of the words to a certain dimension, organize the question sentences into the form of a two-dimensional matrix similar to the image, and use it as the input of the CNN model. e advantage is that it solves the data sparsity problem in traditional question classification methods and greatly improves the classification accuracy
Summary
In the Internet age, information has exploded. In the face of massive fragmented information, people’s desire to quickly obtain accurate and concise information has become more and more urgent, and the QAS has emerged at the historic moment. QAS is a highlevel form of information retrieval, which has become a hot focus in the field of current natural language. For the question “What color is the skin of Chinese people,” the system will give the answer directly as “yellow.”. It greatly improves the users’ query efficiency and better meets the users’ needs. E QAS generally includes three main parts: question analysis, information retrieval, and answer extraction [1], and each part cooperates with each other to efficiently obtain the target information required by the user. It can be seen that the result of question classification can provide useful guidance information for other modules
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.