Correct Errors Research Articles

BackgroundChatGPT, a publicly available artificial intelligence (AI) large language model, has allowed for sophisticated AI technology on demand. Indeed, use of ChatGPT has already begun to make its way into medical research. However, the medical community has yet to understand the capabilities and ethical considerations of AI within this context, and unknowns exist regarding ChatGPT’s writing abilities, accuracy, and implications for authorship. ObjectivesWe hypothesize that human reviewers and AI detection software differ in their ability to correctly identify original published abstracts and AI-written abstracts in the subjects of Gynecology and Urogynecology. We additionally suspect that concrete differences in writing errors, readability, and perceived writing quality exist between original and AI-generated text. Study DesignTwenty-five articles published in high impact medical journals and a collection of Gynecology and Urogynecology journals were selected. ChatGPT was prompted to write 25 corresponding AI-generated abstracts, providing the abstract title, journal-dictated abstract requirements, and select original results. The original and AI-generated abstracts were reviewed by blinded Gynecology and Urogynecology faculty and fellows to identify the writing as original or AI-generated. All abstracts were analyzed by publicly available AI detection software GPTZero, Originality, and Copyleaks and were assessed for writing errors and quality by AI writing assistant Grammarly. ResultsOne hundred fifty-seven reviews of 25 original and 25 AI-generated abstracts were conducted by 26 faculty and 4 fellows. Fifty-seven percent of original abstracts and 42.3% of AI-generated abstracts were correctly identified for an average of 49.7% across all abstracts. All three AI detectors rated the original abstracts as less likely be AI-written than the ChatGPT-generated abstracts (GPTZero 5.8 vs 73.3%, p<0.001; Originality 10.9 vs 98.1%, p<0.001; Copyleaks 18.6 vs 58.2%, p<0.001). The performance of the three AI detection software differed when analyzing all abstracts (p=0.03), original abstracts (p<0.001), and AI-generated abstracts (p<0.001). Grammarly text analysis identified more writing issues and correctness errors in original than AI abstracts, including lower Grammarly score reflective of poorer writing quality (82.3 vs 88.1, p=0.006), more total writing issues (19.2 vs 12.8, p<0.001), critical issues (5.4 vs 1.3, p<0.001), confusing words (0.8 vs 0.1, p=0.006), misspelled words (1.7 vs 0.6, p=0.02), incorrect determiner use (1.2 vs 0.2, p=0.002), and comma misuse (0.3 vs 0.0, p=0.005). ConclusionsHuman reviewers are unable to detect the subtle differences between human and ChatGPT-generated scientific writing due to AI’s ability to generate tremendously realistic text. AI detection software improve identification of AI-generated writing but still lack complete accuracy and require programmatic improvements in order to achieve optimal detection. As reviewers and editors may be unable to reliably detect AI-generated pieces, clear guidelines for reporting AI use by authors and implementing AI detection software in the review process will need to be established as AI chatbots gain more widespread use.

Read full abstract

Student surveys with Likert scales and open responses are key to gauging the student experience in educational institutions. However, the thematic analysis of open responses is time-consuming, delaying feedback. This study aims to evaluate the effcacy of ChatGPT-4, a generative AI large language model (LLM) to streamline thematic analysis of student perception surveys. We hypothesise that LLMs can expedite the process, however, human intervention remains essential. The study focused on a 2nd-year physiology course’s and evaluated comparing online vs face-to-face (F2F) delivery, to determine if practical classes could successfully be delivered to students online without compromising the delivery of the desired skills and learning outcomes. Data from six cohorts were included (2019-2022); three semesters online and three F2F. Overall grades, and grades from individual written assessments requiring data analysis and critical thinking showed no difference between the different delivery modes, indicating that major learning outcomes are maintained in online delivery. Student perception was analysed from an online cohort (Semester 2, 2022). Analysis of the Likert data from the student survey from an online cohort (response rate: 40/202) found that students strongly agreed that the class was enjoyable (83% agreement) and the online tools and software were easy to use (83% agreement). Thematic analysis was performed on the open text responses using a LLM (ChatGPT-4) guided by a structured thematic analysis framework and was conducted in three phases: coding responses, collating codes into themes, and visualizing these themes. Each phase required precise prompt engineering to ensure the outputs were accurate and relevant. Thematic analysis using ChatGPT-4 identified that students enjoyed the social aspects of the teamwork and collaboration. The students found the online and software tools easy to use due to rapid feedback from instructors. Altogether, this produced a positive experience in their online learning experiments. A significant advantage to using ChatGPT-4 is the rapid processing of the thematic analysis and alleviate the burdensome aspects of qualitative analysis. Thus, allowing for the timely extraction of nuanced findings provided by qualitative data, and ensuring that student feedback can be effectively addressed. While the results showed that ChatGPT-4 was largely successful in processing the qualitative data, human oversight was necessary to correct minor errors and ensure logical consistency. In addition, LLMs like ChatGPT-4 cannot operate in isolation; human involvement are imperative in making evaluative judgments and checking for hallucinations. Nevertheless, we present a framework for a collaborative human-LLM approach to qualitative analysis of student evaluations to provide more timely feedback and action. The increase rapidity in feedback will help alleviate the student believe that their feedback goes unread and unheeded, thereby improving student outcomes. This is the full abstract presented at the American Physiology Summit 2024 meeting and is only available in HTML format. There are no additional versions or additional content available for this abstract. Physiology was not involved in the peer review process.

Read full abstract

Correct Errors Research Articles

Related Topics

Articles published on Correct Errors

Human vs machine: identifying ChatGPT-generated abstracts in Gynecology and Urogynecology

Fireball symmetry and its influence on perspective error from thermography data

Leveraging the power of generative AI: a case study on feedback analysis of student evaluation in an undergraduate physiology practical course

Material Characterization Augmented with Artificial Intelligence.

A family of permutationally invariant quantum codes

Exploring Undergraduate Students’ Viewpoints on Corrective Feedback Implementations in Interpreting

TQRFormer: Tubelet query recollection transformer for action detection

Integral inequalities of Ostrowski type for two kinds of s-logarithmically convex functions

Tapping into the Pedagogical Potential of infinigoChatIC: Evidence from iWrite Scoring and Comments and Lu & Ai’s Linguistic Complexity Analyzer

What can writing-process data add to the assessment of spelling difficulties?

Stitching interferometry using alternating calibration of positioning and systematic errors

Comparative Analysis between Vietnamese Reduplicative Words with Hmong Language Reduplicative Ones to Propose Measures to Correct Mistakes in using Vietnamese Reduplicative Words for Hmong Primary School Pupils in Vietnam

Laboratory study of aberration calculation in underwater turbulence using Shack-Hartmann wavefront sensor and Zernike polynomials

DiSTNet2D: Leveraging Long-Range Temporal Information for Efficient Segmentation and Tracking

E-PVT: enhanced position-velocity-time scheduler for computer-controlled optical finishing with comprehensive considerations of dynamics constraints, continuity and efficiency

Scaling and merging macromolecular diffuse scattering with mdx2.

InAs quantum dots with a narrow photoluminescence linewidth for a lower threshold current density in 1.55 µm lasers: erratum

Refractive Lens Exchange after Implanted Collamer Lens: A Case Report

A Simple Statistical Postprocessing Scheme for Enhancing the Skill of Seasonal SST Predictions in the Tropics

Ionospheric corrections tailored to Galileo HAS: validation with single-epoch navigation

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Correct Errors Research Articles

Related Topics

Articles published on Correct Errors

Human vs machine: identifying ChatGPT-generated abstracts in Gynecology and Urogynecology

Fireball symmetry and its influence on perspective error from thermography data

Leveraging the power of generative AI: a case study on feedback analysis of student evaluation in an undergraduate physiology practical course

Material Characterization Augmented with Artificial Intelligence.

A family of permutationally invariant quantum codes

Exploring Undergraduate Students’ Viewpoints on Corrective Feedback Implementations in Interpreting

TQRFormer: Tubelet query recollection transformer for action detection

Integral inequalities of Ostrowski type for two kinds of s-logarithmically convex functions

Tapping into the Pedagogical Potential of infinigoChatIC: Evidence from iWrite Scoring and Comments and Lu &amp; Ai’s Linguistic Complexity Analyzer

What can writing-process data add to the assessment of spelling difficulties?

Stitching interferometry using alternating calibration of positioning and systematic errors

Comparative Analysis between Vietnamese Reduplicative Words with Hmong Language Reduplicative Ones to Propose Measures to Correct Mistakes in using Vietnamese Reduplicative Words for Hmong Primary School Pupils in Vietnam

Laboratory study of aberration calculation in underwater turbulence using Shack-Hartmann wavefront sensor and Zernike polynomials

DiSTNet2D: Leveraging Long-Range Temporal Information for Efficient Segmentation and Tracking

E-PVT: enhanced position-velocity-time scheduler for computer-controlled optical finishing with comprehensive considerations of dynamics constraints, continuity and efficiency

Scaling and merging macromolecular diffuse scattering with mdx2.

InAs quantum dots with a narrow photoluminescence linewidth for a lower threshold current density in 1.55 µm lasers: erratum

Refractive Lens Exchange after Implanted Collamer Lens: A Case Report

A Simple Statistical Postprocessing Scheme for Enhancing the Skill of Seasonal SST Predictions in the Tropics

Ionospheric corrections tailored to Galileo HAS: validation with single-epoch navigation

Tapping into the Pedagogical Potential of infinigoChatIC: Evidence from iWrite Scoring and Comments and Lu & Ai’s Linguistic Complexity Analyzer