Human Users Research Articles

Abstract Background Medical trainees are increasingly using online chat-based artificial intelligence (AI) platforms as supplementary resources for board exam preparation and clinical decision support. Prior studies have evaluated the performance of AI chatbots like ChatGPT on various general standardized tests such as the United States Medical Licensing Examination (USMLE), but little is known about their performance on subspecialty-focused exam questions, particularly related to clinical management and treatment. Objective This study aims to evaluate the performance of ChatGPT version 4.0 on the cardiovascular questions from the Medical Knowledge Self-Assessment Program (MKSAP) 19, a widely used resource for board exam preparation in the United States. Methods We submitted all cardiovascular questions from MKSAP 19 to ChatGPT 4.0, covering a broad range of cardiology topics in a multiple-choice format. Performance was gauged against both the official MKSAP answer key and average trainee scores obtained from the MKSAP website. Out of 129 questions, 4 were invalidated due to post-publication data, and 18 were excluded due to reliance on visual aids, leaving 107 questions for analysis. Results ChatGPT 4.0 correctly answered 93 out of 107 questions, reflecting an 87% accuracy rate, compared to a 60% accuracy rate averaged among all human users for the same questions (p&lt;0.0001). ChatGPT accuracy rates for each question category (e.g., heart failure, electrophysiology, etc.) are provided in the figure. On the 14 questions that ChatGPT answered incorrectly, human users averaged a 47% accuracy rate. All 14 incorrectly answered questions were related to clinical management and treatment (as opposed to diagnosis or epidemiology), and the majority (10/14) involved choosing treatment medications or choosing between medications versus interventions. Conclusion ChatGPT 4.0 surpassed the typical 70% accuracy threshold required for passing the internal medicine board certification but performed less well in answering questions about clinical management and treatment. This performance suggests that ChatGPT 4.0 could be an effective adjunct tool for cardiology education, but raises some concerns about its use in clinical decision support for trainees.

Read full abstract

Scientific publications provide a wealth of peer-reviewed, high-quality data that have been maintained over time, resulting in data persistence. As data repositories with rich provenance information, publications are indispensable sources for the integration and extension of networks of interlinked Findable, Accessible, Interoperable and Reusable (FAIR*1) bio/geodiversity data. In this way, they form pivotal fact- and knowledge-based contributions to applications that address the biodiversity crisis. The mobilization of data preserved in scientific publications is hindered, however, by distinct copyright legislation contexts for publications versus the data that they contain. Moreover, legislations concerning copyright continue to lack harmonization across jurisdictions, their interpretation is difficult, and the applicable legal national scope can be uncertain. We clarify and highlight that data within scientific publications are not copyrightable and thus can be openly and freely reused once legal access has been gained to their enclosing publication*2. To ensure that publications are as accessible as possible, a joint statement supported by the Biodiversity Heritage Library (BHL), the Consortium of European Taxonomic Facilities (CETAF) and the Society for the Preservation of Natural History Collections (SPNHC) (Benichou et al. 2023) recommends that authors and publishers make their works as accessible as possible by using a CC-BY license or preferably waive copyright (CC0) to their publications. Explicitly associating a public domain mark (PDM, e.g., the PDM from Creative Commons) to their published data, provides users with certainty about reusability. Yet, by setting works and bio/geodiversity data into the public domain, they do not become a free-for-all. We stress that data need to be associated with clear provenance information in alignment with scientific best practices and the scientific community's social norms. This includes providing detailed attribution to authors of cited works and reused data. Proposed data governance labels, for example, modeled after the Local Contexts labels developed by the international Indigenous Peoples and Local Communities (IPLC) community, would enable authors to communicate social and ethical contexts and applicable rules to data users for ensuring the sustainability of a shared environmental and data commons. Categories of Local Contexts labels that are of interest and applicable in the sciences are, for example, those that communicate (1) correct citation information and ask for attribution when knowledge and/or data are reused (Traditional Knowledge label (TK) Attribution), (2) an interest in being recognized and acknowledged due to a significant relationship with and responsibility for samples and data (Biocultural label (BC) Provenance), (3) the verification of the data and their context following a community protocol (TK Verified), (4) that non-commercial use (TK Non-Commercial/BC Non-Commercial) or (5) outreach activities (TK Outreach/BC Outreach) are generally permitted, while for other uses direct contact and engagement is required, or (6) an openness to collaboration and partnerships (TK Collaboration/BC Collaboration). There are concerns about the tension between the goal of achieving open data (e.g., Anonymous 2014) to enable and promote open science (e.g., UNESCO 2021) and, at the same time, imposing restrictions on these data in the form of governance labels. Furthermore, while the reference of the publication through which data are published, as well as more specifically bibliographic references cited for specific data within the publication, provide sufficient information for attribution and provenance, much more fine-grained and nuanced contextual information (e.g., in the form of metadata) is needed for assuring responsible reuse. Such context-providing metadata unlock the full potential of the data and enable their reusability. This can be done using machine-actionable markup tags in combination with human-readable labels that inform machines and human users about the semantics of the data as well as their ethical and social dimensions that govern responsible and sustainable reuse. Future work is needed to discover, differentiate and define the quality and scope of the appropriate contexts that are necessary and sufficient for being able to fully and responsibly reuse the data in different situations.

Read full abstract

Human Users Research Articles

Related Topics

Articles published on Human Users

Making New Connections: LLMs as Puzzle Generators for the New York Times' Connections Word Game

NIMG-63. LEVERAGING LLMS FOR ACCURATE DIFFERENTIATION OF RADIATION NECROSIS AND TUMOR PROGRESSION IN BRAIN MRI REPORTS: A STUDY ON AUTOMATED SCORING AND CLINICAL IMPLICATIONS

Creativity, credit, and copyright in the age of artificial art

Beyond Preferences in AI Alignment

Research on CAPTCHA recognition technology based on deep learning

A Roadmap of Explainable Artificial Intelligence: Explain to Whom, When, What and How?

Interpreting Bar Charts: Effects of 3D Depth Cues on Human Gaze and User Understanding

Il progetto di paesaggio come teatro di coesistenza tra specie. Parc Martin-Luther-King a Parigi

Performance evaluation of ChatGPT 4.0 on cardiovascular questions from the medical knowledge self-assessment program

Does Prosocial Automation Increase Driver’s Well-being?

Beyond humanism: telling response-able stories about significant otherness in human-chatbot relations.

A Human-Centered View of Continual Learning: Understanding Interactions, Teaching Patterns, and Perceptions of Human Users Toward a Continual Learning Robot in Repeated Interactions

LIMIT: Learning Interfaces to Maximize Information Transfer

Self‐Deception in Human–Sex Robot Intimacy

Non-Copyrightability of Data in Scientific Publications: A Free-for-All or a Global Commons Partnership?

Multimodal emotion recognition: A comprehensive review, trends, and challenges

WiCAM2.0: Imperceptible and Targeted Attack on Deep Learning based WiFi Sensing

Explainable AI based Predictions for Workpiece Quality

Team up with AI or Human? Investigating Candidates’ Self-Categorization as Fluidity and Ingroup-Serving Attribution When Judged by a Human–AI Hybrid Jury

LongT5Rank: A Novel Integrated Hybrid Approach for Text Summarisation

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Human Users Research Articles

Related Topics

Articles published on Human Users

Making New Connections: LLMs as Puzzle Generators for the New York Times' Connections Word Game

NIMG-63. LEVERAGING LLMS FOR ACCURATE DIFFERENTIATION OF RADIATION NECROSIS AND TUMOR PROGRESSION IN BRAIN MRI REPORTS: A STUDY ON AUTOMATED SCORING AND CLINICAL IMPLICATIONS

Creativity, credit, and copyright in the age of artificial art

Beyond Preferences in AI Alignment

Research on CAPTCHA recognition technology based on deep learning

A Roadmap of Explainable Artificial Intelligence: Explain to Whom, When, What and How?

Interpreting Bar Charts: Effects of 3D Depth Cues on Human Gaze and User Understanding

Il progetto di paesaggio come teatro di coesistenza tra specie. Parc Martin-Luther-King a Parigi

Performance evaluation of ChatGPT 4.0 on cardiovascular questions from the medical knowledge self-assessment program

Does Prosocial Automation Increase Driver’s Well-being?

Beyond humanism: telling response-able stories about significant otherness in human-chatbot relations.

A Human-Centered View of Continual Learning: Understanding Interactions, Teaching Patterns, and Perceptions of Human Users Toward a Continual Learning Robot in Repeated Interactions

LIMIT: Learning Interfaces to Maximize Information Transfer

Self‐Deception in Human–Sex Robot Intimacy

Non-Copyrightability of Data in Scientific Publications: A Free-for-All or a Global Commons Partnership?

Multimodal emotion recognition: A comprehensive review, trends, and challenges

WiCAM2.0: Imperceptible and Targeted Attack on Deep Learning based WiFi Sensing

Explainable AI based Predictions for Workpiece Quality

Team up with AI or Human? Investigating Candidates’ Self-Categorization as Fluidity and Ingroup-Serving Attribution When Judged by a Human–AI Hybrid Jury

LongT5Rank: A Novel Integrated Hybrid Approach for Text Summarisation