Adequacy of prostate cancer prevention and screening recommendations provided by an artificial intelligence-powered large language model.

Alberto Briganti,Alex Stephens,Craig Rogers,Dejan K Filipas,Edoardo Beatrici,Firas Abdollah,Francesco Montorsi,Giovanni Lughezzani,Giuseppe Carrieri,Giuseppe Chiarelli,Giuseppe Ottone Cirulli,Mahendra Bhandari,Marco Finati,Nicolò Buffi,Quoc-Dien Trinh,Shane Tinsley,Sohrab Arora

doi:10.1007/s11255-024-04009-5

Abstract

We aimed to assess the appropriateness of ChatGPT in providing answers related to prostate cancer (PCa) screening, comparing GPT-3.5 and GPT-4. A committee of five reviewers designed 30 questions related to PCa screening, categorized into three difficulty levels. The questions were formulated identically for both GPTs three times, varying the prompts. Each reviewer assigned a score for accuracy, clarity, and conciseness. The readability was assessed by the Flesch Kincaid Grade (FKG) and Flesch Reading Ease (FRE). The mean scores were extracted and compared using the Wilcoxon test. We compared the readability across the three different prompts by ANOVA. In GPT-3.5 the mean score (SD) for accuracy, clarity, and conciseness was 1.5 (0.59), 1.7 (0.45), 1.7 (0.49), respectively for easy questions; 1.3 (0.67), 1.6 (0.69), 1.3 (0.65) for medium; 1.3 (0.62), 1.6 (0.56), 1.4 (0.56) for hard. In GPT-4 was 2.0 (0), 2.0 (0), 2.0 (0.14), respectively for easy questions; 1.7 (0.66), 1.8 (0.61), 1.7 (0.64) for medium; 2.0 (0.24), 1.8 (0.37), 1.9 (0.27) for hard. GPT-4 performed better for all three qualities and difficulty levels than GPT-3.5. The FKG mean for GPT-3.5 and GPT-4 answers were 12.8 (1.75) and 10.8 (1.72), respectively; the FRE for GPT-3.5 and GPT-4 was 37.3 (9.65) and 47.6 (9.88), respectively. The 2nd prompt has achieved better results in terms of clarity (all p < 0.05). GPT-4 displayed superior accuracy, clarity, conciseness, and readability than GPT-3.5. Though prompts influenced the quality response in both GPTs, their impact was significant only for clarity.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Adequacy of prostate cancer prevention and screening recommendations provided by an artificial intelligence-powered large language model.

Abstract

Talk to us

Similar Papers

More From: International urology and nephrology

Lead the way for us

Journal: International urology and nephrology	Publication Date: Apr 2, 2024
Citations: 2

Similar Papers

Editorial Comment
Eric H Kim ... Gerald L Andriole
Urology | VOL. 91
Eric H Kim, et. al.Eric H Kim ... Gerald L Andriole
15 Mar 2016
Urology | VOL. 91

Relationship Among Initial Serum Prostate Specific Antigen, Prostate Specific Antigen Progression and Prostate Cancer Detection at Repeat Screening Visits
Bernard Candas ... Ghyslain Brousseau
The Journal of Urology | VOL. 175
Bernard Candas, et. al.Bernard Candas ... Ghyslain Brousseau
07 Jan 2006
The Journal of Urology | VOL. 175

AB0513 RATES OF CANCER SCREENING AND INVESTIGATION OF RISK FACTORS FOR SUB-OPTIMAL SCREENING IN A POPULATION-BASED COHORT OF PATIENTS WITH RHEUMATOID ARTHRITIS
R T Brooks ... C S Crowson
Annals of the Rheumatic Diseases | VOL. 83
R T Brooks, et. al.R T Brooks ... C S Crowson
01 Jun 2024
Annals of the Rheumatic Diseases | VOL. 83

PROSTATE SPECIFIC ANTIGEN BASED PROSTATE CANCER SCREENING: ACCUMULATING EVIDENCE OF EFFICACY BUT PERSISTENT UNCERTAINTY
Gerald L Andriole
The Journal of Urology | VOL. 174
Gerald L AndrioleGerald L Andriole
01 Aug 2005
The Journal of Urology | VOL. 174

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Adequacy of prostate cancer prevention and screening recommendations provided by an artificial intelligence-powered large language model.

Abstract

Talk to us

Similar Papers

More From: International urology and nephrology