AbstractObjectiveTo explore the potential and accuracy of the generative dialogue artificial intelligence tool GPT‐4.0 in answering questions related to paediatric emergency appendicitis.MethodsA cross‐sectional observational study design was used. We collected 134 appendicitis‐related questions from authoritative websites, such as Mayo Medical and APSA, covering all aspects of appendicitis, including simple and complex questions. These questions were answered by GPT‐4.0, and then evaluated by three paediatric surgical experts using a quality score ranging from 0 to 5. The answers were generated by GPT‐4.0 and then similarly evaluated by three experts for accuracy.ResultsWe found that GPT‐4.0 could achieve a high accuracy rate on simple questions with a quality score of 4.65 (standard deviation 0.51). For complex questions, the average score was 3.77 (standard deviation 0.68), and there was a significant difference between the two (P < .05). On clinical questions, the accuracy score of GPT‐4.0 was 4.00 (standard deviation 0.21). When answering actual questions from families of children with appendicitis, the accuracy score was 4.12 (standard deviation 0.59). Its accuracy lies between simple questions and complex questions, and it can basically meet the accuracy requirements of clinical questions. It's worth noting that GPT‐4.0 demonstrated empathy in answering some questions, which might further enhance patient satisfaction.ConclusionGPT‐4.0 showed its potential and accuracy in handling paediatric appendicitis questions, especially in simple and clinical questions. However, improvements are still needed in handling complex questions and updating information. Despite the limitations, this model is expected to improve the quality of medical services and enhance patient satisfaction.
Read full abstract