Fog Index Research Articles

PurposeThis study aimed to evaluate the reliability and readability of responses generated by two popular AI-chatbots, ‘ChatGPT-4.0’ and ‘Google Gemini’, to potential patient questions about PET/CT scans. Materials and methodsThirty potential questions for each of [18F]FDG and [68Ga]Ga-DOTA-SSTR PET/CT, and twenty-nine potential questions for [68Ga]Ga-PSMA PET/CT were asked separately to ChatGPT-4 and Gemini in May 2024. The responses were evaluated for reliability and readability using the modified DISCERN (mDISCERN) scale, Flesch Reading Ease (FRE), Gunning Fog Index (GFI), and Flesch-Kincaid Reading Grade Level (FKRGL). The inter-rater reliability of mDISCERN scores provided by three raters (ChatGPT-4, Gemini, and a nuclear medicine physician) for the responses was assessed. ResultsThe median [min-max] mDISCERN scores reviewed by the physician for responses about FDG, PSMA and DOTA PET/CT scans were 3.5 [2–4], 3 [3–4], 3 [3–4] for ChatPT-4 and 4 [2–5], 4 [2–5], 3.5 [3–5] for Gemini, respectively. The mDISCERN scores assessed using ChatGPT-4 for answers about FDG, PSMA, and DOTA-SSTR PET/CT scans were 3.5 [3–5], 3 [3–4], 3 [2–3] for ChatGPT-4, and 4 [3–5], 4 [3–5], 4 [3–5] for Gemini, respectively. The mDISCERN scores evaluated using Gemini for responses FDG, PSMA, and DOTA-SSTR PET/CTs were 3 [2–4], 2 [2–4], 3 [2–4] for ChatGPT-4, and 3 [2–5], 3 [1–5], 3 [2–5] for Gemini, respectively. The inter-rater reliability correlation coefficient of mDISCERN scores for ChatGPT-4 responses about FDG, PSMA, and DOTA-SSTR PET/CT scans were 0.629 (95% CI = 0,32−0,812), 0.707 (95% CI = 0.458−0.853) and 0.738 (95% CI = 0.519−0.866), respectively (p < 0.001). The correlation coefficient of mDISCERN scores for Gemini responses about FDG, PSMA, and DOTA-SSTR PET/CT scans were 0.824 (95% CI = 0.677−0.910), 0.881 (95% CI = 0.78−0.94) and 0.847 (95% CI = 0.719−0.922), respectively (p < 0.001). The mDISCERN scores assessed by ChatGPT-4, Gemini, and the physician showed that the chatbots' responses about all PET/CT scans had moderate to good statistical agreement according to the inter-rater reliability correlation coefficient (p < 0,001). There was a statistically significant difference in all readability scores (FKRGL, GFI, and FRE) of ChatGPT-4 and Gemini responses about PET/CT scans (p < 0,001). Gemini responses were shorter and had better readability scores than ChatGPT-4 responses. ConclusionThere was an acceptable level of agreement between raters for the mDISCERN score, indicating agreement with the overall reliability of the responses. However, the information provided by AI-chatbots cannot be easily read by the public.

Read full abstract

Introduction Osteoarthritis (OA) is an age-related degenerative joint disease. There is a 25% risk of symptomatic hip OA in patients who live up to 85 years of age. It can impair a person's daily activities and increase their reliance on healthcare services. It is primarily managed with education, weight loss and exercise, supplemented with pharmacological interventions. Poor health literacy is associated with negative treatment outcomes and patient dissatisfaction. A literature search found there are no previously published studies examining the readability of online information about hip OA. Objectives To assess the readability of healthcare websites regarding hip OA. Methods The terms "hip pain", "hip osteoarthritis", "hip arthritis", and "hip OA" were searched on Google and Bing. Of 240 websites initially considered, 74 unique websites underwent evaluation using the WebFX online readability software (WebFX®, Harrisburg, USA). Readability was determined using the Flesch Reading Ease Score (FRES), Flesch-Kincaid Reading Grade Level (FKGL), Gunning Fog Index (GFI), Simple Measure of Gobbledygook (SMOG), Coleman-Liau Index (CLI), and Automated Readability Index (ARI). In line with recommended guidelines and previous studies, FRES >65 or a grade level score of sixth grade and under was considered acceptable. Results The average FRES was 56.74±8.18 (range 29.5-79.4). Only nine (12.16%) websites had a FRES score >65. The average FKGL score was 7.62±1.69 (range 4.2-12.9). Only seven (9.46%) websites were written at or below a sixth-grade level according to the FKGL score. The average GFI score was 9.20±2.09 (range 5.6-16.5). Only one (1.35%) website was written at or below a sixth-grade level according to the GFI score.The average SMOG score was 7.29±1.41 (range 5.4-12.0). Only eight (10.81%) websites were written at or below a sixth-grade level according to the SMOG score. The average CLI score was 13.86±1.75 (range 9.6-19.7). All 36 websites were written above a sixth-grade level according to the CLI score. The average ARI score was 6.91±2.06 (range 3.1-14.0). Twenty-eight(37.84%) websites were written at or below a sixth-grade level according to the ARI score. One-sample t-tests showed that FRES (p<0.001, CI -10.2 to -6.37), FKGL (p<0.001, CI 1.23 to 2.01), GFI (p<0.001, CI 2.72 to 3.69), SMOG (p<0.001, CI 0.97 to 1.62), CLI (p<0.001, CI 7.46 to 8.27), and ARI (p<0.001, CI 0.43 to 1.39) scores were significantly different from the accepted standard. One-way analysis of variance (ANOVA) testing of FRES scores (p=0.009) and CLI scores (p=0.009) showed a significant difference between categories. Post hoc testing showed a significant difference between academic and non-profit categories for FRES scores (p=0.010, CI -15.17 to -1.47) and CLI scores (p=0.008, CI 0.35 to 3.29). Conclusions Most websites regarding hip OA are written above recommended reading levels, hence exceeding the comprehension levels of the average patient. Readability of these resources must be improved to improve patient access to online healthcare information which can lead to improved patient understanding of their own condition and treatment outcomes.

Read full abstract

Fog Index Research Articles

Related Topics

Articles published on Fog Index

Assessing the Responses of Large Language Models (ChatGPT-4, Claude 3, Gemini, and Microsoft Copilot) to Frequently Asked Questions in Retinopathy of Prematurity: A Study on Readability and Appropriateness.

Assessing the readability and quality of online written information on epistaxis.

Improving Biomedical Science Literacy and Patient-Directed Knowledge of Tuberculosis (TB): A Cross-Sectional Infodemiology Study Examining Readability of Patient-Facing TB Information.

Development and Validation of an Artificial Intelligence-Assisted Patient Education Material for Ostomy Patients: A Methodological Study.

Appropriateness and readability of Google Bard and ChatGPT-3.5 generated responses for surgical treatment of glaucoma.

Evaluating AI-Generated informed consent documents in oral surgery: A comparative study of ChatGPT-4, Bard gemini advanced, and human-written consents

Abstract B105: Quality and Accessibility of Online Information Regarding CAR-T Therapy in Hematologic Malignancies

Pillar 3: The impact of language complexity on the preferences of commercial bank website users

Evaluación de la fiabilidad y legibilidad de las respuestas de los chatbots como recurso de información al paciente para las exploraciones PET-TC más communes

Evaluation of the Appropriateness and Readability of ChatGPT-4 Responses to Patient Queries on Uveitis

Readability and Accessibility of Patient-Education Materials for Heart Failure in the United States

Readability of Patient-Facing Information of Antibiotics Used in the WHO Short 6-Month and 9-Month All Oral Treatment for Drug-Resistant Tuberculosis

Lessons to be learned when designing comprehensible patient-oriented online information about temporomandibular disorders.

Assessing readability of online patient educational material on concussion and return to play.

Assessing the Readability of Patient Education Materials on Cardiac Catheterization From Artificial Intelligence Chatbots: An Observational Cross-Sectional Study.

Does Political Uncertainty Obfuscate Narrative Disclosure?

Comparing patient education tools for chronic pain medications: Artificial intelligence chatbot versus traditional patient information leaflets.

Enhancing Readability of Online Patient-Facing Content: The Role of AI Chatbots in Improving Cancer Information Accessibility.

Quality, Reliability, Readability, and Accountability of Online Information on Leukocoria.

A Cross-Sectional Analysis of the Readability of Online Information Regarding Hip Osteoarthritis.

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Fog Index Research Articles

Related Topics

Articles published on Fog Index

Assessing the Responses of Large Language Models (ChatGPT-4, Claude 3, Gemini, and Microsoft Copilot) to Frequently Asked Questions in Retinopathy of Prematurity: A Study on Readability and Appropriateness.

Assessing the readability and quality of online written information on epistaxis.

Improving Biomedical Science Literacy and Patient-Directed Knowledge of Tuberculosis (TB): A Cross-Sectional Infodemiology Study Examining Readability of Patient-Facing TB Information.

Development and Validation of an Artificial Intelligence-Assisted Patient Education Material for Ostomy Patients: A Methodological Study.

Appropriateness and readability of Google Bard and ChatGPT-3.5 generated responses for surgical treatment of glaucoma.

Evaluating AI-Generated informed consent documents in oral surgery: A comparative study of ChatGPT-4, Bard gemini advanced, and human-written consents

Abstract B105: Quality and Accessibility of Online Information Regarding CAR-T Therapy in Hematologic Malignancies

Pillar 3: The impact of language complexity on the preferences of commercial bank website users

Evaluación de la fiabilidad y legibilidad de las respuestas de los chatbots como recurso de información al paciente para las exploraciones PET-TC más communes

Evaluation of the Appropriateness and Readability of ChatGPT-4 Responses to Patient Queries on Uveitis

Readability and Accessibility of Patient-Education Materials for Heart Failure in the United States

Readability of Patient-Facing Information of Antibiotics Used in the WHO Short 6-Month and 9-Month All Oral Treatment for Drug-Resistant Tuberculosis

Lessons to be learned when designing comprehensible patient-oriented online information about temporomandibular disorders.

Assessing readability of online patient educational material on concussion and return to play.

Assessing the Readability of Patient Education Materials on Cardiac Catheterization From Artificial Intelligence Chatbots: An Observational Cross-Sectional Study.

Does Political Uncertainty Obfuscate Narrative Disclosure?

Comparing patient education tools for chronic pain medications: Artificial intelligence chatbot versus traditional patient information leaflets.

Enhancing Readability of Online Patient-Facing Content: The Role of AI Chatbots in Improving Cancer Information Accessibility.

Quality, Reliability, Readability, and Accountability of Online Information on Leukocoria.

A Cross-Sectional Analysis of the Readability of Online Information Regarding Hip Osteoarthritis.