Comparative effectiveness of ChatGPT 4.0 and medical oncologists in telemedicine-based management of metastatic prostate carcinoma (mPC).

Emre Dal,Haoran Li,Richard Ji,Neeraj Agarwal,Georges Gebrael,Chadi Hage Chehade,Arshit Narang,Beverly Chigarira,Ayana Srivastava,Umang Swami

doi:10.1200/jco.2024.42.4_suppl.226

Abstract

226 Background: The advent of telemedicine, accentuated during the COVID-19 pandemic, offers a prospective modality, especially when synergized with artificial intelligence (AI) tools such as ChatGPT 4.0. In this investigation, we sought to investigate the proficiency of ChatGPT vs. medical oncologists in the telemedicine-centric management of mPC. Methods: This IRB-approved retrospective study compared the competencies of ChatGPT and oncologists in conducting telemedicine consultations for patients with mPC. Out of 975 patients screened between April 1, 2022, and March 30, 2023, 102 met the inclusion criteria of having a diagnosis of mPC, attending at least one telemedicine consultation during the specified period, and having documentation available for two consecutive visits to enable the analysis of treatment decisions and outcomes. ChatGPT was asked to pre-chart and determine if a face-to-face consultation is needed. Its clinical competence was assessed using miniCEX, and medical decision-making (MDM). The Cohen's kappa test was used to measure the level of agreement between ChatGPT and oncologists in treatment decisions, and the Mann-Whitney U test was used to compare miniCEX and MDM. Results: The majority of patients were White (97.06%), with a median age of 75 years (range:53 to 99). Nearly all patients were diagnosed with adenocarcinoma (96.08%), with a median Gleason score of 7 (range: 6 to 10). The prevalent metastatic sites were bone (47.4%) and lymph nodes (44.16%). 26.88% had an ECOG score of 0, 54.84% had a score of 1, and 18.28% had a score greater than 1. Common coexisting conditions included diabetes mellitus (11.11%), hypertension (29.82%), hyperlipidemia (24.56%), and depression (7.6%). The primary outcome measured the concordance between ChatGPT and oncologists on whether to continue or cease the current treatment There was a statistical significance in sensitivity and specificity between clinicians and ChatGPT (Chi-squared=5.1, p=0.02). Cohen's Kappa showed a moderate concordance (Kappa = 0.43, p<0.001). There was no difference in the number of diagnoses made by the two parties (p=0.13 and 0.06, respectively). ChatGPT's median miniCEX score was 8 (SD=0.59), and its median MDM length was 41 words (SD=6.06). The average time saved by ChatGPT in pre-charting was 41 minutes (SD=6). Conclusions: ChatGPT showed moderate concordance with oncologists for the management of mPC in telemedicine. Subsequent investigations are needed to explore its potential in healthcare.

Full Text