Abstract

BackgroundIn question answering (QA) system development, question classification is crucial for identifying information needs and improving the accuracy of returned answers. Although the questions are domain-specific, they are asked by non-professionals, making the question classification task more challenging.ObjectiveThis study aimed to classify health care–related questions posted by the general public (Chinese speakers) on the Internet.MethodsA topic-based classification schema for health-related questions was built by manually annotating randomly selected questions. The Kappa statistic was used to measure the interrater reliability of multiple annotation results. Using the above corpus, we developed a machine-learning method to automatically classify these questions into one of the following six classes: Condition Management, Healthy Lifestyle, Diagnosis, Health Provider Choice, Treatment, and Epidemiology.ResultsThe consumer health question schema was developed with a four-hierarchical-level of specificity, comprising 48 quaternary categories and 35 annotation rules. The 2000 sample questions were coded with 2000 major codes and 607 minor codes. Using natural language processing techniques, we expressed the Chinese questions as a set of lexical, grammatical, and semantic features. Furthermore, the effective features were selected to improve the question classification performance. From the 6-category classification results, we achieved an average precision of 91.41%, recall of 89.62%, and F1 score of 90.24%.ConclusionsIn this study, we developed an automatic method to classify questions related to Chinese health care posted by the general public. It enables Artificial Intelligence (AI) agents to understand Internet users’ information needs on health care.

Highlights

  • The Internet is increasingly becoming a main resource for consumers to acquire health information

  • In this study, we developed an automatic method to classify questions related to Chinese health care posted by the general public

  • The final classification schema was a four-hierarchical-level of specificity, consisting of 48 quaternary categories and 35 annotation rules

Read more

Summary

Introduction

The Internet is increasingly becoming a main resource for consumers to acquire health information. Until December 2015, there were 152 million Internet health users in China, indicating that 22.1% of Chinese Internet users have looked online for health information and services [1]. Many studies have proved that health-related information online could impact consumers’ health-related attitudes and behaviors [2,3,4]. Consumers have difficulty in expressing their information needs accurately using medical query terms, failing to retrieve relevant health information [5,6]. Automatic question answering (QA) systems are available for such users and they respond with concise and correct answers using natural language processing techniques. In question answering (QA) system development, question classification is crucial for identifying information needs and improving the accuracy of returned answers. The questions are domain-specific, they are asked by non-professionals, making the question classification task more challenging

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call