Use Of Artificial Intelligence In Medicine Research Articles

Purpose Artificial intelligence (AI) has rapidly gained popularity with the growth of ChatGPT (OpenAI, San Francisco, USA) and other large-language model chatbots, and these programs have tremendous potential to impact medicine. One important area of consequence in medicine and public health is that patients may use these programs in search of answers to medical questions. Despite the increased utilization of AI chatbots by the public, there is little research to assess the reliability of ChatGPT and alternative programs when queried for medical information. This study seeks to elucidate the accuracy and readability of AI chatbots in answering patient questions regarding urology. As vasectomy is one of the most common urologic procedures, this study investigates AI-generated responses to frequently asked vasectomy-related questions. For this study, five popular and free-to-access AI platforms were utilized to undertake this investigation. Methods Fifteen vasectomy-related questions were individually queried to five AI chatbots from November-December 2023: ChatGPT (OpenAI, San Francisco, USA), Bard (Google Inc., Mountainview, USA) Bing (Microsoft, Redmond, USA) Perplexity (Perplexity AI Inc., San Francisco, USA), and Claude (Anthropic, San Francisco, USA). Responses from each platform were graded by two attending urologists, two urology research faculty, and one urological resident physician using a Likert (1-6) scale: (1-completely inaccurate, 6-completely accurate) based on comparison to existing American Urological Association guidelines. Flesch-Kincaid Grade levels (FKGL) and Flesch Reading Ease scores (FRES) (1-100) were calculated for each response. To assess differences in Likert, FRES, and FKGL, Kruskal-Wallis tests were performed using GraphPad Prism V10.1.0 (GraphPad,San Diego,USA) with Alpha set at 0.05. Results Analysis shows that ChatGPT provided the most accurate responses across the five AI chatbots with an average score of 5.04 on the Likert scale. Subsequently, Microsoft Bing (4.91), Anthropic Claude (4.65), Google Bard (4.43), and Perplexity (4.41) followed. All five chatbots were found to score, on average, higher than 4.41 corresponding to a score of at least "somewhat accurate." Google Bard received the highest Flesch Reading Ease score (49.67) and lowest Grade level (10.1) when compared to the other chatbots. Anthropic Claude scored 46.7 on the FRES and 10.55 on the FKGL. Microsoft Bing scored 45.57 on the FRES and 11.56 on the FKGL. Perplexity scored 36.4 on the FRES and 13.29 on the FKGL. ChatGPT had the lowest FRES of 30.4 and highest FKGL of 14.2. Conclusion This study investigates the use of AI in medicine, specifically urology, and it helps to determine whether large-language model chatbots can be reliable sources of freely available medical information. All five AI chatbots on average were able to achieve at least "somewhat accurate" on a 6-point Likert scale. In terms of readability, all five AI chatbots on average had Flesch Reading Ease scores of less than 50 and were higher than a 10th-grade level. In this small-scale study, there were several significant differences identified between the readability scores of each AI chatbot. However, there were no significant differences found among their accuracies. Thus, our study suggests that major AI chatbots may perform similarly in their ability to be correct but differ in their ease of being comprehended by the general public.

Read full abstract

Artificial intelligence (AI) has gained tremendous popularity recently, especially the use of natural language processing (NLP). ChatGPT is a state-of-the-art chatbot capable of creating natural conversations using NLP. The use of AI in medicine can have a tremendous impact on health care delivery. Although some studies have evaluated ChatGPT's accuracy in self-diagnosis, there is no research regarding its precision and the degree to which it recommends medical consultations. The aim of this study was to evaluate ChatGPT's ability to accurately and precisely self-diagnose common orthopedic diseases, as well as the degree of recommendation it provides for medical consultations. Over a 5-day course, each of the study authors submitted the same questions to ChatGPT. The conditions evaluated were carpal tunnel syndrome (CTS), cervical myelopathy (CM), lumbar spinal stenosis (LSS), knee osteoarthritis (KOA), and hip osteoarthritis (HOA). Answers were categorized as either correct, partially correct, incorrect, or a differential diagnosis. The percentage of correct answers and reproducibility were calculated. The reproducibility between days and raters were calculated using the Fleiss κ coefficient. Answers that recommended that the patient seek medical attention were recategorized according to the strength of the recommendation as defined by the study. The ratios of correct answers were 25/25, 1/25, 24/25, 16/25, and 17/25 for CTS, CM, LSS, KOA, and HOA, respectively. The ratios of incorrect answers were 23/25 for CM and 0/25 for all other conditions. The reproducibility between days was 1.0, 0.15, 0.7, 0.6, and 0.6 for CTS, CM, LSS, KOA, and HOA, respectively. The reproducibility between raters was 1.0, 0.1, 0.64, -0.12, and 0.04 for CTS, CM, LSS, KOA, and HOA, respectively. Among the answers recommending medical attention, the phrases "essential," "recommended," "best," and "important" were used. Specifically, "essential" occurred in 4 out of 125, "recommended" in 12 out of 125, "best" in 6 out of 125, and "important" in 94 out of 125 answers. Additionally, 7 out of the 125 answers did not include a recommendation to seek medical attention. The accuracy and reproducibility of ChatGPT to self-diagnose five common orthopedic conditions were inconsistent. The accuracy could potentially be improved by adding symptoms that could easily identify a specific location. Only a few answers were accompanied by a strong recommendation to seek medical attention according to our study standards. Although ChatGPT could serve as a potential first step in accessing care, we found variability in accurate self-diagnosis. Given the risk of harm with self-diagnosis without medical follow-up, it would be prudent for an NLP to include clear language alerting patients to seek expert medical opinions. We hope to shed further light on the use of AI in a future clinical study.

Read full abstract

Use Of Artificial Intelligence In Medicine Research Articles

Related Topics

Articles published on Use Of Artificial Intelligence In Medicine

Potential Applications and Ethical Considerations for Artificial Intelligence in Traumatic Brain Injury Management

Accuracy and Readability of Artificial Intelligence Chatbot Responses to Vasectomy-Related Questions: Public Beware.

Physicians' Attitudes Towards Artificial Intelligence: Results of the PEAK Project.

Этические и культурные вызовы внедрения искусственного интеллекта в медицинских практиках: мультикультуральный анализ

Knowledge and attitude of medical students towards artificial intelligence in ophthalmology in Riyadh, Saudi Arabia: a cross-sectional study.

The Use of AI in Medicine: Health Data, Privacy Risks and More

Ethical Dilemmas of Using Artificial Intelligence in Medicine.

Perceptions of undergraduate medical students on artificial intelligence in medicine: mixed-methods survey study from Palestine

Medical Expectations of Physicians on AI Solutions in Daily Practice: Cross-Sectional Survey Study.

Hierarchy of Ethical Principles for the use of Artificial Intelligence in Medicine and Healthcare

Proposing a Principle-Based Approach for Teaching AI Ethics in Medical Education.

The Knowledge of Students at Bursa Faculty of Medicine towards Artificial Intelligence: A Survey Study

Artificial Intelligence in Medicine: Cross-Sectional Study Among Medical Students on Application, Education, and Ethical Aspects.

Artificial intelligence in medicine: Ethical, social and legal perspectives

Artificial intelligence in medicine: Ethical, social and legal perspectives.

Artificial Intelligence in Medicine: Progress, Challenges, and Promises

The Potential of ChatGPT as a Self-Diagnostic Tool in Common Orthopedic Diseases: Exploratory Study.

Evaluation of acceptance, attitude, and knowledge towards artificial intelligence and its application from the point of view of physicians and nurses: A provincial survey study in Iran: A cross-sectional descriptive-analytical study.

Proč bychom měli a neměli využívat ChatGPT v gastroenterologii?

What Is Machine Learning, Artificial Neural Networks and Deep Learning?-Examples of Practical Applications in Medicine.

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Use Of Artificial Intelligence In Medicine Research Articles

Related Topics

Articles published on Use Of Artificial Intelligence In Medicine

Potential Applications and Ethical Considerations for Artificial Intelligence in Traumatic Brain Injury Management

Accuracy and Readability of Artificial Intelligence Chatbot Responses to Vasectomy-Related Questions: Public Beware.

Physicians' Attitudes Towards Artificial Intelligence: Results of the PEAK Project.

Этические и культурные вызовы внедрения искусственного интеллекта в медицинских практиках: мультикультуральный анализ

Knowledge and attitude of medical students towards artificial intelligence in ophthalmology in Riyadh, Saudi Arabia: a cross-sectional study.

The Use of AI in Medicine: Health Data, Privacy Risks and More

Ethical Dilemmas of Using Artificial Intelligence in Medicine.

Perceptions of undergraduate medical students on artificial intelligence in medicine: mixed-methods survey study from Palestine

Medical Expectations of Physicians on AI Solutions in Daily Practice: Cross-Sectional Survey Study.

Hierarchy of Ethical Principles for the use of Artificial Intelligence in Medicine and Healthcare

Proposing a Principle-Based Approach for Teaching AI Ethics in Medical Education.

The Knowledge of Students at Bursa Faculty of Medicine towards Artificial Intelligence: A Survey Study

Artificial Intelligence in Medicine: Cross-Sectional Study Among Medical Students on Application, Education, and Ethical Aspects.

Artificial intelligence in medicine: Ethical, social and legal perspectives

Artificial intelligence in medicine: Ethical, social and legal perspectives.

Artificial Intelligence in Medicine: Progress, Challenges, and Promises

The Potential of ChatGPT as a Self-Diagnostic Tool in Common Orthopedic Diseases: Exploratory Study.

Evaluation of acceptance, attitude, and knowledge towards artificial intelligence and its application from the point of view of physicians and nurses: A provincial survey study in Iran: A cross-sectional descriptive-analytical study.

Proč bychom měli a neměli využívat ChatGPT v gastroenterologii?

What Is Machine Learning, Artificial Neural Networks and Deep Learning?-Examples of Practical Applications in Medicine.