ObjectiveChat-based artificial intelligence programs like ChatGPT are reimagining how patients seek information. This study aims to evaluate the quality and accuracy of ChatGPT-generated answers to common patient questions about lung cancer surgery. MethodsA 30-question survey of patient questions about lung cancer surgery was posed to ChatGPT in July 2023. The ChatGPT-generated responses were presented to 9 thoracic surgeons at 4 academic institutions who rated the quality of the answer on a 5-point Likert scale. They also evaluated if the response contained any inaccuracies and were prompted to submit free text comments. Responses were analyzed in aggregate. ResultsFor ChatGPT-generated answers, the average quality ranged from 3.1 to 4.2 of 5.0, indicating they were generally “good” or “very good.” No answer received a unanimous 1-star (poor quality) or 5-star (excellent quality) score. Minor inaccuracies were found by at least 1 surgeon in 100% of the answers, and major inaccuracies were found in 36.6%. Regarding ChatGPT, 66.7% of surgeons thought it was an accurate source of information for patients. However, only 55.6% thought they were comparable with answers given by experienced thoracic surgeons, and only 44.4% would recommend it to their patients. Common criticisms of ChatGPT-generated answers included lengthiness, lack of specificity regarding surgical care, and lack of references. ConclusionsChat-based artificial intelligence programs have potential to become a useful information tool for patients with lung cancer surgery. However, the quality and accuracy of ChatGPT-generated answers need improvement before thoracic surgeons would consider this method as a primary education source for patients.
Read full abstract