Abstract

As a new type of knowledge sharing platform, the community question answer website realizes the acquisition and sharing of knowledge, and is loved and sought after by the majority of users. But for multi-answer questions, answer quality assessment becomes a challenge. The answer selection in CQA (Community Question Answer) was proposed as a challenge task in the SemEval competition, which gave a data set and proposed two subtasks. Task-A is to give a question (including short title and extended description) and its answers, and divide each answer into absolutely relevant (good), potentially relevant (potential) and bad or irrelevant (bad, dialog, non-English, other). Task-B is to give a YES/NO type question (including short title and extended description) and some answers. Based on the answer of the absolute correlation type (good), judge whether the answer to the whole question should be yes, no or uncertain. This paper first preprocesses this data set, and then uses natural language processing technology to perform word segmentation, part-of-speech tagging and named entity recognition on the data set, and then perform feature extraction on the preprocessed data set. Finally, SVM and random forest are used to classify on the basis of feature extraction, and the classification results are analyzed and compared. The experiments in this paper show that SVM and random forest methods have good results on the data set, and exceed the multi-classifier ensemble learning method and hierarchical classification method proposed by the predecessors.

Highlights

  • The experiments in this paper show that SVM and random forest methods have good results on the data set, and exceed the multi-classifier ensemble learning method and hierarchical classification method proposed by the predecessors

  • 6 Conclusions Based on the above comparison, we can conclude that the difference between the Random Forest and SVM methods is not very big in Task-A, while in Task-B, Random Forest has better results

  • In the comparison with the previous methods, the Accuracy of our Random Forest and SVM methods in Task-B and Task-A is higher than that of the method used by Hou, so we can conclude that our Random Forest and SVM methods exceed the integrated multi-classifier learning method and the hierarchical classification method used by Hou, and there is a slight difference

Read more

Summary

Introduction

With the rapid development of Internet technology and the rapid popularization of mobile Internet, socialization, personalization, and communalization have become the trends of the Internet. As a new type of knowledge sharing platform, the CQA (community question answer) website can realize the acquisition and sharing of knowledge by virtue of its good interactivity and reasonable incentive mechanism, and meet the personalized knowledge needs of different users. It is loved and sought after by the majority of users. Since the CQA involves a wealth of content and topics, the answers to the questions raised by the community users will accumulate more and more over time, so that the needs of the users can be solved. More and more scholars at home and abroad have participated in the research of community question answering [7–9]

Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call