The Accuracy and Potential Racial and Ethnic Biases of GPT-4 in the Diagnosis and Triage of Health Conditions: Evaluation Study.

Naoki Ito,Tadahiro Goto,Mineto Fujisawa,Daisuke Kasugai,Ryo Ishizawa,Mikio Nakajima,Sakina Kadomatsu,Yusuke Tsugawa,Kiyomitsu Fukaguchi,Naoki Kanda

doi:10.2196/47532

Abstract

Whether GPT-4, the conversational artificial intelligence, can accurately diagnose and triage health conditions and whether it presents racial and ethnic biases in its decisions remain unclear. We aim to assess the accuracy of GPT-4 in the diagnosis and triage of health conditions and whether its performance varies by patient race and ethnicity. We compared the performance of GPT-4 and physicians, using 45 typical clinical vignettes, each with a correct diagnosis and triage level, in February and March 2023. For each of the 45 clinical vignettes, GPT-4 and 3 board-certified physicians provided the most likely primary diagnosis and triage level (emergency, nonemergency, or self-care). Independent reviewers evaluated the diagnoses as "correct" or "incorrect." Physician diagnosis was defined as the consensus of the 3 physicians. We evaluated whether the performance of GPT-4 varies by patient race and ethnicity, by adding the information on patient race and ethnicity to the clinical vignettes. The accuracy of diagnosis was comparable between GPT-4 and physicians (the percentage of correct diagnosis was 97.8% (44/45; 95% CI 88.2%-99.9%) for GPT-4 and 91.1% (41/45; 95% CI 78.8%-97.5%) for physicians; P=.38). GPT-4 provided appropriate reasoning for 97.8% (44/45) of the vignettes. The appropriateness of triage was comparable between GPT-4 and physicians (GPT-4: 30/45, 66.7%; 95% CI 51.0%-80.0%; physicians: 30/45, 66.7%; 95% CI 51.0%-80.0%; P=.99). The performance of GPT-4 in diagnosing health conditions did not vary among different races and ethnicities (Black, White, Asian, and Hispanic), with an accuracy of 100% (95% CI 78.2%-100%). P values, compared to the GPT-4 output without incorporating race and ethnicity information, were all .99. The accuracy of triage was not significantly different even if patients' race and ethnicity information was added. The accuracy of triage was 62.2% (95% CI 46.5%-76.2%; P=.50) for Black patients; 66.7% (95% CI 51.0%-80.0%; P=.99) for White patients; 66.7% (95% CI 51.0%-80.0%; P=.99) for Asian patients, and 62.2% (95% CI 46.5%-76.2%; P=.69) for Hispanic patients. P values were calculated by comparing the outputs with and without conditioning on race and ethnicity. GPT-4's ability to diagnose and triage typical clinical vignettes was comparable to that of board-certified physicians. The performance of GPT-4 did not vary by patient race and ethnicity. These findings should be informative for health systems looking to introduce conversational artificial intelligence to improve the efficiency of patient diagnosis and triage.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: JMIR Medical Education	Publication Date: Nov 2, 2023
Citations: 16	License type: cc-by

R Discovery Prime

R Discovery Prime

The Accuracy and Potential Racial and Ethnic Biases of GPT-4 in the Diagnosis and Triage of Health Conditions: Evaluation Study.

Abstract

Talk to us

Similar Papers

More From: JMIR Medical Education

Lead the way for us

Similar Papers

Triage and Diagnostic Accuracy of Online Symptom Checkers: Systematic Review.
Eva Riboli-Sasco ... Manisha Karki
Journal of Medical Internet Research | VOL. 25
Eva Riboli-Sasco, et. al.Eva Riboli-Sasco ... Manisha Karki
02 Jun 2023
Journal of Medical Internet Research | VOL. 25

Association of Gender and Race With Allocation of Advanced Heart Failure Therapies
Khadijah Breathett ... Nancy K Sweitzer
JAMA Network Open | VOL. 3
Khadijah Breathett, et. al.Khadijah Breathett ... Nancy K Sweitzer
21 Jul 2020
JAMA Network Open | VOL. 3

Sex and Race Biases in Allocation of Advanced Heart Failure Therapies
K Breathett ... J Crist
Health Services Research | VOL. 55
K Breathett, et. al.K Breathett ... J Crist
01 Aug 2020
Health Services Research | VOL. 55

Association of unconscious race and social class bias with vignette-based clinical assessments by medical students.
Adil H Haider ... Pamela A Lipsett
JAMA | VOL. 306
Adil H Haider, et. al.Adil H Haider ... Pamela A Lipsett
07 Sep 2011
JAMA | VOL. 306

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

The Accuracy and Potential Racial and Ethnic Biases of GPT-4 in the Diagnosis and Triage of Health Conditions: Evaluation Study.

Abstract

Talk to us

Similar Papers

More From: JMIR Medical Education