Senior Physician Research Articles

Artificial intelligence (AI) has permeated academia, especially OpenAI Chat Generative Pretrained Transformer (ChatGPT), a large language model. However, little has been reported on its use in medical research. To assess a chatbot's capability to generate and grade medical research abstracts. In this cross-sectional study, ChatGPT versions 3.5 and 4.0 (referred to as chatbot 1 and chatbot 2) were coached to generate 10 abstracts by providing background literature, prompts, analyzed data for each topic, and 10 previously presented, unassociated abstracts to serve as models. The study was conducted between August 2023 and February 2024 (including data analysis). Abstract versions utilizing the same topic and data were written by a surgical trainee or a senior physician or generated by chatbot 1 and chatbot 2 for comparison. The 10 training abstracts were written by 8 surgical residents or fellows, edited by the same senior surgeon, at a high-volume hospital in the Southeastern US with an emphasis on outcomes-based research. Abstract comparison was then based on 10 abstracts written by 5 surgical trainees within the first 6 months of their research year, edited by the same senior author. The primary outcome measurements were the abstract grades using 10- and 20-point scales and ranks (first to fourth). Abstract versions by chatbot 1, chatbot 2, junior residents, and the senior author were compared and judged by blinded surgeon-reviewers as well as both chatbot models. Five academic attending surgeons from Denmark, the UK, and the US, with extensive experience in surgical organizations, research, and abstract evaluation served as reviewers. Surgeon-reviewers were unable to differentiate between abstract versions. Each reviewer ranked an AI-generated version first at least once. Abstracts demonstrated no difference in their median (IQR) 10-point scores (resident, 7.0 [6.0-8.0]; senior author, 7.0 [6.0-8.0]; chatbot 1, 7.0 [6.0-8.0]; chatbot 2, 7.0 [6.0-8.0]; P = .61), 20-point scores (resident, 14.0 [12.0-7.0]; senior author, 15.0 [13.0-17.0]; chatbot 1, 14.0 [12.0-16.0]; chatbot 2, 14.0 [13.0-16.0]; P = .50), or rank (resident, 3.0 [1.0-4.0]; senior author, 2.0 [1.0-4.0]; chatbot 1, 3.0 [2.0-4.0]; chatbot 2, 2.0 [1.0-3.0]; P = .14). The abstract grades given by chatbot 1 were comparable to the surgeon-reviewers' grades. However, chatbot 2 graded more favorably than the surgeon-reviewers and chatbot 1. Median (IQR) chatbot 2-reviewer grades were higher than surgeon-reviewer grades of all 4 abstract versions (resident, 14.0 [12.0-17.0] vs 16.9 [16.0-17.5]; P = .02; senior author, 15.0 [13.0-17.0] vs 17.0 [16.5-18.0]; P = .03; chatbot 1, 14.0 [12.0-16.0] vs 17.8 [17.5-18.5]; P = .002; chatbot 2, 14.0 [13.0-16.0] vs 16.8 [14.5-18.0]; P = .04). When comparing the grades of the 2 chatbots, chatbot 2 gave higher median (IQR) grades for abstracts than chatbot 1 (resident, 14.0 [13.0-15.0] vs 16.9 [16.0-17.5]; P = .003; senior author, 13.5 [13.0-15.5] vs 17.0 [16.5-18.0]; P = .004; chatbot 1, 14.5 [13.0-15.0] vs 17.8 [17.5-18.5]; P = .003; chatbot 2, 14.0 [13.0-15.0] vs 16.8 [14.5-18.0]; P = .01). In this cross-sectional study, trained chatbots generated convincing medical abstracts, undifferentiable from resident or senior author drafts. Chatbot 1 graded abstracts similarly to surgeon-reviewers, while chatbot 2 was less stringent. These findings may assist surgeon-scientists in successfully implementing AI in medical research.

Read full abstract

De novo spinal infections are an increasing medical problem. The decision-making for surgical or nonsurgical treatment for de novo spinal infections is often a non-evidence-based process and commonly a case-by-case decision by single physicians. A scoring system based on the latest evidence might help improve the decision-making process compared with other purely radiology-based scoring systems or the judgment of a single senior physician. Patients older than 18 years with an infection of the spine who underwent nonsurgical or surgical treatment between 2019 and 2021 were identified. Clinical data for neurological status, pain, and existing comorbidities were gathered and transferred to an anonymous spreadsheet. Patients without an MR image and a CT scan of the affected spine region were excluded from the investigation. A multidisciplinary expert panel used the Spine Instability Neoplastic Score (SINS), Spinal Instability Spondylodiscitis Score (SISS), and Spinal Infection Treatment Evaluation Score (SITE Score), previously developed by the authors' group, on every clinical case. Each physician of the expert panel gave an individual treatment recommendation for surgical or nonsurgical treatment for each patient. Treatment recommendations formed the expert panel opinion, which was used to calculate predictive validities for each score. A total of 263 patients with spinal infections were identified. After the exclusion of doubled patients, patients without de novo infections, or those without CT and MRI scans, 123 patients remained for the investigation. Overall, 70.70% of patients were treated surgically and 29.30% were treated nonoperatively. Intraclass correlation coefficients (ICCs) for the SITE Score, SINS, and SISS were 0.94 (95% CI 0.91-0.95, p < 0.01), 0.65 (95% CI 0.91-0.83, p < 0.01), and 0.80 (95% CI 0.91-0.89, p < 0.01). In comparison with the expert panel decision, the SITE Score reached a sensitivity of 96.97% and a specificity of 81.90% for all included patients. For potentially unstable and unstable lesions, the SISS and the SINS yielded sensitivities of 84.42% and 64.07%, respectively, and specificities of 31.16% and 56.52%, respectively. The SITE Score showed higher overall sensitivity with 97.53% and a higher specificity for patients with epidural abscesses (75.00%) compared with potentially unstable and unstable lesions for the SINS and the SISS. The SITE Score showed a significantly higher agreement for the definitive treatment decision regarding the expert panel decision, compared with the decision by a single physician for patients with spondylodiscitis, discitis, or spinal osteomyelitis. The SITE Score shows high sensitivity and specificity regarding the treatment recommendation by a multidisciplinary expert panel. The SITE Score shows higher predictive validity compared with radiology-based scoring systems or a single physician and demonstrates a high validity for patients with epidural abscesses.

Read full abstract

Senior Physician Research Articles

Articles published on Senior Physician

Deep learning based clinical target volumes contouring for prostate cancer: Easy and efficient application.

Preparing porcine lens to mimic human lens capsule.

Anxiety in Endometriosis Patients: Implications for Clinical Practice

Deep learning radiomics based on ultrasound images for the assisted diagnosis of chronic kidney disease.

Comparison of Medical Research Abstracts Written by Surgical Trainees and Senior Surgeons or Generated by Large Language Models

Advanced Practice Provider Professional Advancement Model: A 10-Year Experience

Ongoing decision-making dilemma for treatment of de novo spinal infections: a comparison of the Spinal Infection Treatment Evaluation Score with the Spinal Instability Spondylodiscitis Score and Spine Instability Neoplastic Score.

The Impact of Involving a Senior Emergency Physician in the Triage Process

Randomized controlled open-label trial to evaluate prioritization software for the secondary triage of patients in the pediatric emergency department

Effectiveness of nirmatrelvir-ritonavir versus azvudine for adult inpatients with severe or critical COVID-19

Complications Rate and a Multidimensional Analysis of Their Causes of Tube Thoracostomy: A Mixed-Methods Study.

Abstract 3569: Using AI to automatically process data from unstructured health records of patients with lung cancer

Retrospective study on unilateral polyotia combined with microtia utilizing the technique of preserving residual ear tissue

Use of a Handheld Ultrasonographic Device to Identify Heart Failure and Pulmonary Disease in Rural Africa

Deep-learning-based renal artery stenosis diagnosis via multimodal fusion.

Heterogeneity in the role of emergency physicians and treatment of acute atrial fibrillation in emergency departments—results of the International Atrial Fibrillation Background (AFiB) Study

Cost-effectiveness in an interprofessional training ward within a university department for internal medicine: a monocentric open-label controlled study of the A-STAR Regensburg.

Bridging AI and Clinical Practice: Integrating Automated Sleep Scoring Algorithm with Uncertainty-Guided Physician Review.

TRAINING IN HEALTH AND AGING POLICY MID CAREER—REFLECTING ON A POTENTIAL INFLECTION POINT

Application of Patient Sentiment Analysis to Evaluate Glaucoma Care

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Senior Physician Research Articles

Articles published on Senior Physician

Deep learning based clinical target volumes contouring for prostate cancer: Easy and efficient application.

Preparing porcine lens to mimic human lens capsule.

Anxiety in Endometriosis Patients: Implications for Clinical Practice

Deep learning radiomics based on ultrasound images for the assisted diagnosis of chronic kidney disease.

Comparison of Medical Research Abstracts Written by Surgical Trainees and Senior Surgeons or Generated by Large Language Models

Advanced Practice Provider Professional Advancement Model: A 10-Year Experience

Ongoing decision-making dilemma for treatment of de novo spinal infections: a comparison of the Spinal Infection Treatment Evaluation Score with the Spinal Instability Spondylodiscitis Score and Spine Instability Neoplastic Score.

The Impact of Involving a Senior Emergency Physician in the Triage Process

Randomized controlled open-label trial to evaluate prioritization software for the secondary triage of patients in the pediatric emergency department

Effectiveness of nirmatrelvir-ritonavir versus azvudine for adult inpatients with severe or critical COVID-19

Complications Rate and a Multidimensional Analysis of Their Causes of Tube Thoracostomy: A Mixed-Methods Study.

Abstract 3569: Using AI to automatically process data from unstructured health records of patients with lung cancer

Retrospective study on unilateral polyotia combined with microtia utilizing the technique of preserving residual ear tissue

Use of a Handheld Ultrasonographic Device to Identify Heart Failure and Pulmonary Disease in Rural Africa

Deep-learning-based renal artery stenosis diagnosis via multimodal fusion.

Heterogeneity in the role of emergency physicians and treatment of acute atrial fibrillation in emergency departments—results of the International Atrial Fibrillation Background (AFiB) Study

Cost-effectiveness in an interprofessional training ward within a university department for internal medicine: a monocentric open-label controlled study of the A-STAR Regensburg.

Bridging AI and Clinical Practice: Integrating Automated Sleep Scoring Algorithm with Uncertainty-Guided Physician Review.

TRAINING IN HEALTH AND AGING POLICY MID CAREER—REFLECTING ON A POTENTIAL INFLECTION POINT

Application of Patient Sentiment Analysis to Evaluate Glaucoma Care