Generative Pre-trained Transformer Research Articles

The rapid advancement of generative artificial intelligence (AI) in education and the workforce necessitates that students acquire critical competencies, including digital literacy, data integrity, and ethical AI use, to navigate an increasingly AI-driven world. Equally, academic staff must be upskilled to effectively guide students in the ethical and practical applications of AI. This project examines how a private higher education (HE) institute has modified its Quality Assessment Framework (QAF) to incorporate an AI competence dimension, improve HE assessment and foster students’ competencies. This new dimension complements the existing QAF elements—Intellectual Quality, Significance, and Student Support to ensure a holistic approach in preparing students for AI-integrated academic and professional environments. This project’s framework and innovative tools, such as the enhanced QAF and an AI-driven Generative Pre-trained Transformer (GPT) assessment review tool, support educators in adapting curriculum and assessment design, and empower them to integrate AI competencies seamlessly into the curriculum. The development of these tools drew upon established frameworks, particularly the QAF (Gore et al., 2009) and the AI Assessment Scale (AIAS) by Perkins et al. (2024), which categorises AI usage across five levels, guides assessment reviews and ensures consistent standards for AI integration in student work. A trial, conducted across selected disciplines at the institute, involved academic staff as assessment reviewers and utilised the GPT tool to streamline the assessment review process by aiding in feedback provision, coding, and suggestions for improvement. Evaluation of the tool includes qualitative and quantitative methodologies, gathering academic feedback on usability, clarity, and effectiveness, as well as comparative studies to assess review time and quality before and after GPT integration. Usability testing evaluates workflow compatibility, while academic performance data provide insights into the tool’s impact on student outcomes. Initial results on the tool’s success in enhancing assessment alignment with institutional goals and fostering a comprehensive understanding of AI competence were also examined. Developing and implementing these tools has presented challenges, including the complexity of aligning them with diverse curriculum needs across disciplines, the time investment required for staff training, and the necessity of continuous updates to keep pace with AI technology. Addressing these challenges has been crucial to ensuring the effectiveness and sustainability of these solutions. This digital poster provides insights into the project, covering its development process, trial outcomes, challenges encountered, and future directions for integrating AI competence in higher education assessments.

Read full abstract

Abstract Background Paediatric cardiology presents unique challenges with its diverse and complex cases, limited evidence base, and the necessity for multi-expert involvement in decision-making processes. In this context, the introduction of generative pre-trained transformer (GPT) based large language models (LLMs) offers a potential avenue for the provision of complex information and clinical decision support. Purpose This study evaluates the quality of three different GPT LLMs in answering complex medical questions, including a state-of-the-art preview model that incorporates the German paediatric cardiology guidelines. Methods Seven paediatric cardiologists and paediatric cardiac surgeons generated 72 questions, including complex questions and medical cases with associated questions. The questions were categorized by difficulty and required knowledge (factual and experience-based or mostly experience-based). We prompted the questions to three LLMs: GPT 3.5, GPT 4 and a GPT 4 turbo preview. The GPT 4 turbo preview was customized by incorporating all guidelines from the German Society for Paediatric Cardiology by a retrieval function. Employing one complex instruction for all questions, we prompted the LLMs to provide precise and detailed expert-level responses. The responses from each model were evaluated by experts based on relevance, factual accuracy, severity of possible harm, completeness, superfluous content, and age-related appropriateness from 0 (very bad) to 7 (very good). Differences were calculated using the Kruskal-Wallis-test in SPSS Version 28. Results Our findings indicated a good performance of all models regarding the dimensions tested. The figures show the average ratings (Figure 1, Figure 2A) and highlight significant differences after Bonferroni correction in bold (Figure 2B). The GPT 4 turbo preview, including the retrieval of guidelines, provided significantly more relevant (average rating [AR] 5.94, meaning mostly relevant), accurate (AR 5.6, meaning between somewhat and mostly accurate) and complete (AR 5, meaning fairly complete) answers compared to GPT 3.5 and GPT 4. In terms of difficulty levels or the type of questions, there was no significant difference in rating. Relevance ratings were slightly better in factual questions (AR 5.7) than in those requiring more experience-based knowledge (AR 5,3). Although GPT4 had higher average scores compared to GPT 3.5 in all dimensions except superfluous content, the differences in rating were not statistically significant. All models had relevant difficulties considering the age-related aspects of the questions (AR 4.06-4.45, p=0.455). Conclusion This study highlights the potential and limitations of AI language models in addressing complex medical questions in fields characterized by highly individualized decision-making scenarios. The findings advocate for the development of more specialized AI tools in medicine, tailored to specific medical fields and patient age groups.Fig 1:Average ratings of LLMsFig 2:Rating differences between LLMs

Read full abstract

Generative Pre-trained Transformer Research Articles

Related Topics

Articles published on Generative Pre-trained Transformer

Conversational LLM Chatbot ChatGPT-4 for Colonoscopy Boston Bowel Preparation Scoring: An Artificial Intelligence-to-Head Concordance Analysis

Transformative Assessment Review

Performance of ChatGPT in emergency medicine residency exams in Qatar: A comparative analysis with resident physicians

Global insights: ChatGPT's influence on academic and research writing, creativity, and plagiarism policies

Artificial intelligence as a modality to enhance the readability of neurosurgical literature for patients.

Assessing the need for coronary angiography in high-risk non-ST-elevation acute coronary syndrome patients using artificial intelligence and computed tomography.

Systematic benchmarking demonstrates large language models have not reached the diagnostic accuracy of traditional rare-disease decision support tools.

Unravelling ChatGPT's potential in summarising qualitative in-depth interviews.

Utility of Generative Artificial Intelligence for Patient Care Counseling for Mandibular Fractures.

High performance P300 spellers using GPT2 word prediction with cross-subject training

Can GPT-4 learn to analyse moves in research article abstracts?

Ontology-assisted GPT-based building performance simulation and assessment: Implementation of multizone airflow simulation

Enhancing rumor detection with data augmentation and generative pre-trained transformer

Playing with words: Comparing the vocabulary and lexical diversity of ChatGPT and humans

Metadata schema for virtual building models in digital twins: VB schema implemented in GPT-based applications

Exploring Korean Medicine professions' perspectives on the applicability of ChatGPT in facial palsy practice: A web-based survey

Assessing the Role of the Generative Pretrained Transformer (GPT) in Alzheimer's Disease Management: Comparative Study of Neurologist- and Artificial Intelligence-Generated Responses.

Accuracy and Completeness of Bard and Chat-GPT 4 Responses for Questions Derived from the International Consensus Statement on Endoscopic Skull-Base Surgery 2019

Evaluating large language models in theory of mind tasks

A guideline-informed language model for paediatric cardiology demonstrates high performance in answering complex medical questions

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Generative Pre-trained Transformer Research Articles

Related Topics

Articles published on Generative Pre-trained Transformer

Conversational LLM Chatbot ChatGPT-4 for Colonoscopy Boston Bowel Preparation Scoring: An Artificial Intelligence-to-Head Concordance Analysis

Transformative Assessment Review

Performance of ChatGPT in emergency medicine residency exams in Qatar: A comparative analysis with resident physicians

Global insights: ChatGPT's influence on academic and research writing, creativity, and plagiarism policies

Artificial intelligence as a modality to enhance the readability of neurosurgical literature for patients.

Assessing the need for coronary angiography in high-risk non-ST-elevation acute coronary syndrome patients using artificial intelligence and computed tomography.

Systematic benchmarking demonstrates large language models have not reached the diagnostic accuracy of traditional rare-disease decision support tools.

Unravelling ChatGPT's potential in summarising qualitative in-depth interviews.

Utility of Generative Artificial Intelligence for Patient Care Counseling for Mandibular Fractures.

High performance P300 spellers using GPT2 word prediction with cross-subject training

Can GPT-4 learn to analyse moves in research article abstracts?

Ontology-assisted GPT-based building performance simulation and assessment: Implementation of multizone airflow simulation

Enhancing rumor detection with data augmentation and generative pre-trained transformer

Playing with words: Comparing the vocabulary and lexical diversity of ChatGPT and humans

Metadata schema for virtual building models in digital twins: VB schema implemented in GPT-based applications

Exploring Korean Medicine professions' perspectives on the applicability of ChatGPT in facial palsy practice: A web-based survey

Assessing the Role of the Generative Pretrained Transformer (GPT) in Alzheimer's Disease Management: Comparative Study of Neurologist- and Artificial Intelligence-Generated Responses.

Accuracy and Completeness of Bard and Chat-GPT 4 Responses for Questions Derived from the International Consensus Statement on Endoscopic Skull-Base Surgery 2019

Evaluating large language models in theory of mind tasks

A guideline-informed language model for paediatric cardiology demonstrates high performance in answering complex medical questions