BackgroundThose receiving the diagnosis of multiple sclerosis (MS) over the next ten years will predominantly be part of Generation Z (Gen Z). Recent observations within our clinic suggest that younger people with MS utilize online generative artificial intelligence (AI) platforms for personalized medical advice prior to their first visit with a specialist in neuroimmunology. The use of such platforms is anticipated to increase given the technology driven nature, desire for instant communication, and cost-conscious nature of Gen Z. Our objective was to determine if ChatGPT (Generative Pre-trained Transformer) could diagnose MS in individuals earlier than their clinical timeline, and to assess if the accuracy differed based on age, sex, and race/ethnicity. MethodsPeople with MS between 18 and 59 years of age were studied. The clinical timeline for people diagnosed with MS was retrospectively identified and simulated using ChatGPT-3.5 (GPT-3.5). Chats were conducted using both actual and derivatives of their age, sex, and race/ethnicity to test diagnostic accuracy. A Kaplan-Meier survival curve was estimated for time to diagnosis, clustered by subject. The p-value testing for differences in time to diagnosis was accomplished using a general Wilcoxon test. Logistic regression (subject-specific intercept) was used to capture intra-subject correlation to test the accuracy prior to and after the inclusion of MRI data. ResultsThe study cohort included 100 unique people with MS. Of those, 50 were members of Gen Z (38 female; 22 White; mean age at first symptom was 20.6 years (y) (standard deviation (SD)=2.2y)), and 50 were non-Gen Z (34 female; 27 White; mean age at first symptom was 37.0y (SD=10.4y)). In addition, a total of 529 people that represented digital simulations of the original cohort of 100 people (333 female; 166 White; 136 Black/African American; 107 Asian; 120 Hispanic, mean age at first symptom was 31.6y (SD=12.4y)) were generated allowing for 629 scripted conversations to be analyzed. The estimated median time to diagnosis in clinic was significantly longer at 0.35y (95% CI=[0.28, 0.48]) versus that by ChatGPT at 0.08y (95% CI=[0.04, 0.24]) (p<0.0001). There was no difference in the diagnostic accuracy between ages and by race/ethnicity prior to the inclusion of MRI data. However, prior to including the MRI data, males had a 47% less likely chance of a correct diagnosis relative to females (p=0.05). Post-MRI data inclusion within GPT-3.5, the odds of an accurate diagnosis was 4.0-fold greater for Gen Z participants, relative to non-Gen Z participants (p=0.01) with the diagnostic accuracy being 68% less in males relative to females (p=0.009), and 75% less for White subjects, relative to non-White subjects (p=0.0004). ConclusionAlthough generative AI platforms enable rapid information access and are not principally designed for use in healthcare, an increase in use by Gen Z is anticipated. However, the obtained responses may not be generalizable to all users and bias may exist in select groups.
Read full abstract