AI Language Research Articles

ObjectiveThis article presents a study comparing the responses of two AI chatbots, ChatGPT and ChatSonic, regarding inquiries about epilepsy. Overall, ChatGPT and ChatSonic are very similar in terms of their capabilities and limitations and they are the most widely used AI software. However, there are some key differences, such as their training data, supported languages, and pricing model. The study aims to assess the potential application of AI in patient counseling and decision-making regarding epilepsy treatment. MethodsThe study categorized the inquiries of patients about epilepsy into two groups: patient counseling and judgment. Ten questions were formulated within these categories. Two specialized doctors evaluated the reliability and accuracy of the chatbot replies using the Global Quality Scale (GQS) and a modified version of the DISCERN score. ResultsThe median value for GQS of 4.5 was given by Evaluator JC, and a median value for GQS of 4.0 was given by Evaluator VV. Furthermore, a median for RS of 5.0 was given by Evaluator JC, and a median for RS of 4.0 was given by Evaluator VV. The GQS data from Evaluators JC and VV have a Spearman correlation coefficient of -0.531, indicating an inversely proportional association, and a p-value of 0.016, indicating a statistically significant relationship between the variables. However, the correlation coefficient of RS between data by Evaluator JC and Evaluator VV is 0.368 which indicates the correlation is a directly proportional relationship, with a p-value of 0.110 which is not statistically significant, does not establish a relation between the variables. Weighted Kappa was used to study the agreement between the data. With a weighted kappa value of -0.318 and a 95 %CI of -0.570, -0.065 was obtained for GQS. This can help reject the null hypothesis indicating that the values by the Evaluator JC and Evaluator VV are statistically significant and has a negative agreement. However, a weighted kappa value of 0.1327 with a 95 %CI of -0.093, 0.359 obtained for RS, fails to reject the null hypothesis indicating that the values by the Evaluator JC and Evaluator VV are not significant and no agreement exists between the Evaluators. The results of this study suggest that both ChatGPT and ChatSonic have the potential to be valuable tools for epilepsy patients and their healthcare providers. However, it is important to note that the two evaluators had better agreement on the GQS scores than on the RS scores, suggesting that the GQS may be a more reliable measure of the quality of chatbot responses. ConclusionThe findings underscore the importance of collaboration among policymakers, healthcare professionals, and AI designers to ensure appropriate and safe utilization of AI chatbots in the healthcare domain. While AI chatbots can provide valuable information, it is crucial to acknowledge their limitations, including reliance on the training data and occasional factual errors. The study highlights the need for further testing and validation of AI language models in the management of epilepsy as it concludes.

Read full abstract

Introduction Ensuring access to high-quality information is paramount to facilitating informed surgical decision-making. The use of the internet to access health-related information is increasing, along with the growing prevalence of AI language models such as ChatGPT. We aim to assess the standard of AI-generated patient-facing information through a qualitative analysis of its readability and quality. Materials and methods We performed a retrospective qualitative analysis of information regarding three common vascular procedures: endovascular aortic repair (EVAR), endovenous laser ablation (EVLA), and femoro-popliteal bypass (FPBP). The ChatGPT responses were compared to patient information leaflets provided by the vascular charity, Circulation Foundation UK. Readability was assessed using four readability scores: the Flesch-Kincaid reading ease (FKRE) score, the Flesch-Kincaid grade level (FKGL), the Gunning fog score (GFS), and the simple measure of gobbledygook (SMOG) index. Quality was assessed using the DISCERN tool by two independent assessors. Results The mean FKRE score was 33.3, compared to 59.1 for the information provided by the Circulation Foundation (SD=14.5, p=0.025) indicating poor readability of AI-generated information. The FFKGL indicated that the expected grade of students likely to read and understand ChatGPT responses was consistently higher than compared to information leaflets at 12.7 vs. 9.4 (SD=1.9, p=0.002). Two metrics measure readability in terms of the number of years of education required to understand a piece of writing: the GFS and SMOG. Both scores indicated that AI-generated answers were less accessible. The GFS for ChatGPT-provided information was 16.7 years versus 12.8 years for the leaflets (SD=2.2, p=0.002) and the SMOG index scores were 12.2 and 9.4 years for ChatGPT and the patient information leaflets, respectively (SD=1.7, p=0.001). The DISCERN scores were consistently higher in human-generated patient information leaflets compared to AI-generated information across all procedures; the mean score for the information provided by ChatGPT was 50.3 vs. 56.0 for the Circulation Foundation information leaflets (SD=3.38, p<0.001). Conclusion We concluded that AI-generated information about vascular surgical procedures is currently poor in both the readability of text and the quality of information. Patients should be directed to reputable, human-generated information sources from trusted professional bodies to supplement direct education from the clinician during the pre-procedure consultation process.

Read full abstract

AI Language Research Articles

Related Topics

Articles published on AI Language

A cross-sectional study to assess response generated by ChatGPT and ChatSonic to patient queries about Epilepsy

Assessing the Viability and Effectiveness of Chatgpt Applications in the Customer Service Industry: A Study on Business Models and User Experience

Artificial Intelligence in Scoliosis Classification: An Investigation of Language-Based Models.

P22 Disrupting Health Economics: Automating Network Meta-Analyses With AI and Large Language Models

Customer journey optimisation using large language models: Best practices and pitfalls in generative AI

Development and evaluation of a software system for medical students to teach and practice anamnestic interviews with virtual patient avatars

AI Content Generation Technology based on Open AI Language Model

수학교육의 교수학적 실천과 연구 측면에서 AI 언어 모델 활용 방안 탐색

AI-Generated Information for Vascular Patients: Assessing the Standard of Procedure-Specific Information Provided by the ChatGPT AI-Language Model.

Transforming Gambling Harm Reduction in Youth: Leveraging AI Language Models for Personalized Intervention and Prevention

ChatGPT and medicine: how AI language models are shaping the future and health related careers.

ChatGPT-3.5 as writing assistance in students’ essays

Factors affecting accounting students’ misuse of chatgpt: an application of the fraud triangle theory

“Chatting with ChatGPT”: Analyzing the factors influencing users' intention to Use the Open AI's ChatGPT using the UTAUT model

A New Era for Data Analysis in Qualitative Research: ChatGPT!

Applications of AI in Computer Vision and NLP

Literature Review: Recent Advances in Computer Vision and Language AI

Application of ChatGPT in Higher Education and Research – A Futuristic Analysis

Emergent local structures in an ecosystem of social bots and humans on Twitter

Children of AI: A Protocol for Managing the Born-Digital Ephemera Spawned by Generative AI Language Models

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

AI Language Research Articles

Related Topics

Articles published on AI Language

A cross-sectional study to assess response generated by ChatGPT and ChatSonic to patient queries about Epilepsy

Assessing the Viability and Effectiveness of Chatgpt Applications in the Customer Service Industry: A Study on Business Models and User Experience

Artificial Intelligence in Scoliosis Classification: An Investigation of Language-Based Models.

P22 Disrupting Health Economics: Automating Network Meta-Analyses With AI and Large Language Models

Customer journey optimisation using large language models: Best practices and pitfalls in generative AI

Development and evaluation of a software system for medical students to teach and practice anamnestic interviews with virtual patient avatars

AI Content Generation Technology based on Open AI Language Model

수학교육의 교수학적 실천과 연구 측면에서 AI 언어 모델 활용 방안 탐색

AI-Generated Information for Vascular Patients: Assessing the Standard of Procedure-Specific Information Provided by the ChatGPT AI-Language Model.

Transforming Gambling Harm Reduction in Youth: Leveraging AI Language Models for Personalized Intervention and Prevention

ChatGPT and medicine: how AI language models are shaping the future and health related careers.

ChatGPT-3.5 as writing assistance in students’ essays

Factors affecting accounting students’ misuse of chatgpt: an application of the fraud triangle theory

“Chatting with ChatGPT”: Analyzing the factors influencing users' intention to Use the Open AI's ChatGPT using the UTAUT model

A New Era for Data Analysis in Qualitative Research: ChatGPT!

Applications of AI in Computer Vision and NLP

Literature Review: Recent Advances in Computer Vision and Language AI

Application of ChatGPT in Higher Education and Research – A Futuristic Analysis

Emergent local structures in an ecosystem of social bots and humans on Twitter

Children of AI: A Protocol for Managing the Born-Digital Ephemera Spawned by Generative AI Language Models