Exploring AI-generated content and professional guidelines in cancer symptom management: A comparative analysis between ChatGPT and NCCN guidelines.

David Lazris,Teresa Thomas,Yael Schenker

doi:10.1200/jco.2024.42.16_suppl.e13610

Abstract

e13610 Background: Artificial intelligence (AI)-driven tools, like ChatGPT, have become widely-available sources for online health information. Limited research has explored the congruity between AI-generated content and professional treatment guidelines. This study seeks to compare recommendations for cancer-related symptoms generated from ChatGPT with guidelines from National Comprehensive Cancer Network (NCCN), a provider focused source requiring users to log in or register to access its recommendations. Utilizing a provider-focused source like NCCN serves as a benchmark to assess whether ChatGPT recommendations align with the standards typically endorsed by clinicians. Methods: We extracted treatment recommendations from four NCCN Supportive Care webpages (Cancer Pain, Antiemesis, Cancer-Related Fatigue, and Distress Management) and five subsections of the NCCN Palliative Care webpage (dyspnea, constipation, diarrhea, sleep disturbances, and anorexia/cachexia). We then entered "How can I reduce my cancer-related [symptom]" into ChatGPT 3.5 and extracted its recommendations. We calculated and compared word count and Flesch-Kincaid Grade Level readability for each NCCN and ChatGPT section. We completed a comparative content analysis focusing on recommendations for medications, consultations, and non-pharmacological strategies. Results: Across the nine NCCN Supportive Care and Palliative Care webpages, the mean word count was 2393.8 (SD=2601.4) and the mean Flesch-Kincaid Grade Level was 17.3 (SD=1.4) vs 382.4 (SD=29.6) and 11.6 (SD=0.8) for ChatGPT. The mean percent agreement between NCCN and ChatGPT recommendations was 44.6% (range 14.3%-81.8%). ChatGPT and NCCN guidelines shared fewer than half of their symptom-related recommendations in all but one section, fatigue. NCCN offered specific medication recommendations across all sections. ChatGPT's recommendations lacked the specificity observed in NCCN's guidelines including often not suggesting any medications. ChatGPT recommended specific medications in the shortness of breath and diarrhea sections that were not recommended by NCCN. NCCN's guidelines often did not include recommendations related to spirituality or palliative care consults, areas that ChatGPT addressed. Conclusions: While ChatGPT provides concise, accessible supportive care advice including many non-medical support recommendations, discrepancies with guidelines raise concerns for patient-facing symptom management recommendations. Overall, AI-generated content like ChatGPT can assist in providing preliminary information but should be researched in conjunction with comprehensive, evidence-based guidance provided by healthcare professionals. Healthcare providers should work with AI developers to ensure data sources are high-quality and accurate.

Full Text