Artificial intelligence (AI) platforms such as Chat Generative Pre-Trained Transformer (ChatGPT) (Open AI, San Francisco, California, USA) have the capacity to answer health-related questions. It remains unknown whether AI can be a patient-friendly and accurate resource regarding third molar extraction. The purpose was to determine the accuracy and readability of AI responses to common patient questions regarding third molar extraction. This is a cross sectional in-silico assessment of readability and soundness of a computer-generated report. Not applicable. Accuracy, or the ability to provide clinically correct and relevant information, was determined subjectively by 2 reviewers using a 5-point Likert scale, and objectively by comparing responses to American Association of Oral and Maxillofacial Surgeons (AAOMS) clinical consensus papers. Readability, or how easy a piece of text is to read, was assessed using the Flesch Kincaid Reading Ease (FKRE) and Flesch Kincaid Grade Level (FKGL). Both assess readability based on mean number of syllables per word, and words per sentence. To be deemed "readable", FKRE should be >60 and FKGL should be <8. Not applicable. Descriptive statistics were used to analyze the findings of this study. AI-generated responses above the recommended level for the average patient (FKRE: 52; FKGL: 10). The average Likert score was 4.36, suggesting that most responses were accurate with minor inaccuracies or missing information. AI correctly deferred to the provider in instances where no definitive answer exists. Of the responses that addressed content in AAOMS consensus papers, 18/19 responses closely aligned with them. All prompts did not provide citations or references. AI was able to provide mostly accurate responses, and content was closely aligned with AAOMS guidelines. However, responses were too complex for the average third molar extraction patient, and were deficient in citations and references. It is important for providers to educate patients on the utility of AI, and to decide whether to recommend using it for information. Ultimately, the best resource for answers is from the practitioners themselves because the AI platform lacks clinical experience.
Read full abstract