Radiology Reports Research Articles

Background: The effectiveness of different large language models (LLM) in extracting important information from CT angiography (CTA) radiology reports is unknown. Purpose: The Coronary Artery Disease-Reporting and Data System (CAD-RADS) is a standardized reporting system for CTA that categorizes the severity of coronary artery disease to aid in clinical decision-making. To evaluate the agreement between human readers and LLMs in assigning CAD-RADS categories based on CTA reports. Materials and Methods: This retrospective study analyzed reports from patients who underwent coronary CTA for screening or diagnosis at Johns Hopkins Hospital. Reports with CAD-RADS 0–5 findings were collected from Jan. 2020 to Oct. 2023. Initially, 6,212 CTA reports were retrieved. After excluding non-organized reports and retaining only those with CAD-RADS scores, 590 reports remained. Board-certified radiologists and three large language models (GPT-3.5, GPT-4, and Llama3) assigned CAD-RADS categories based on the original radiologists' findings. Two radiologists reviewed the CAD scores, with only one discrepancy in the 590 reports, indicating high inter-agreement. Agreement between human readers and LLMs for CAD-RADS categories was assessed using the Gwet agreement coefficient (AC1 value). Frequencies of changes in CAD-RADS category assignments that could potentially affect clinical management (CAD-RADS 0-2 vs. 3-5) were calculated and compared. Typically, CAD-RADS 0-2 indicates mild disease leading to conservative management, while CAD-RADS 3-5 denotes further testing or disease requiring aggressive intervention to prevent adverse cardiovascular events. Results: Among 590 reports, agreement between original and reviewing radiologists was almost perfect (AC1 = 0.985). Good agreement was also found between original radiologists and Llama3, GPT-3.5, and GPT-4 (AC1 = 0.861, 0.907, 0.941, respectively). Differences in CAD-RADS category upgrades/downgrades potentially affecting clinical management were observed: 0 of 590 (0%) for human readers, 56 of 590 (9.4%) for Llama3, 33 of 590 (5.5%) for GPT-3.5, and 23 of 590 (3.9%) for GPT-4 upgrading CAD-RADs; 8 of 590 (1.4%) for human readers, 19 of 590 (3.2%) for Llama3, 22 of 590 (3.7%) for GPT-3.5, and 12 of 590 (2.0%) for GPT-4 downgrading RAD-CADs. Conclusion: LLMs showed good agreement with human readers in assigning CAD-RADS categories on CTA reports. However, they were more likely to upgrade CAD-RADS category when compared to radiologists.

Read full abstract

Introduction: The prevalence of hypertrophic cardiomyopathy (HCM) in the UK Biobank based on ICD-10 codes (.07%) is lower than global estimates of disease prevalence (0.2 - 0.5%). Prior studies using this data have remarked on the limitations of findings given likely underdiagnosis. The availability of cardiac MRI scans on a fraction of the participants offers an opportunity to identify missed diagnoses. Aims: This study seeks to utilize a generalizable deep learning model to detect likely cases of undiagnosed hypertrophic cardiomyopathy from cardiac MRIs in the UK Biobank. Methods: The foundational model was trained on a multi-institutional dataset of 14,073 cardiac MRIs via a self-supervised contrastive learning approach that sought to minimize the divergence between scans and their associated radiology reports. The pre-trained model was fine-tuned to diagnose hypertrophic cardiomyopathy on a distinct cohort of 4,870 MRIs with 368 cases of HCM, achieving an AUC of 0.94. The fine-tuned model was applied to the UK Biobank cardiac MRI dataset to ascertain predicted probabilities of HCM. Cases exceeding a threshold of 95% – correlating to the top 0.5% of cases (expected specificity of 97% and sensitivity of 60%) – were screened in for manual reading. In a blinded fashion, a board-certified radiologist was tasked with diagnosing HCM on a sample of cases composed of high and low predicted probabilities. Results: Of the 43,017 patients with cardiac MRIs, only 9 (.02%) had an ICD diagnosis of HCM. 266 cardiac MRIs were manually reviewed: 216 had greater than 95% predicted probability of HCM; 50 negative controls were randomly selected amongst cases with predicted probability less than 10%. The radiologist concurred with an HCM diagnosis for 115 cases (sensitivity 53%, specificity 98%), 112 of which were previously undiagnosed. The prevalence of hypertension and aortic stenosis did not significantly differ between the cohort of true positives (69.2%) and false positives (76.6%). The corrected prevalence of HCM in the UK BioBank MRI cohort is estimated at 0.28%. Conclusions: The findings of this study illustrate the remarkable ability of a generalizable deep learning model to detect undiagnosed cases of a rare disease process from cardiac MRIs. This is an important milestone that may allow for widespread screening of hypertrophic cardiomyopathy while minimizing demand for radiologist labor, and thereby allow patients to reap the substantial benefits of earlier treatment.

Read full abstract

Radiology Reports Research Articles

Related Topics

Articles published on Radiology Reports

Extraction of clinical data on major pulmonary diseases from unstructured radiologic reports using a large language model.

Magnetic Resonance Imaging Template to Standardize Reporting of Evacuation Disorders

Auto-Rad: End-to-End Report Generation from Lumber Spine MRI Using Vision–Language Model

Prevalence and clinical impact of radiographic sclerotic lines adjacent to cementless tibial stems in revision total knee arthroplasty: a long-term follow-up study.

Centralized Investigator Review of Radiological and Functional Imaging Reports in Real-World Oncology Studies: The SACHA-France Experience.

The impact of different radiology report formats on patient information processing: a systematic review.

EDI&S SO04 Accuracy of Outsourced Radiology Reports in Emergency Surgical Care; Do they provide a high-quality cost-effective service?

Documentation of incidentally noted hepatic steatosis to emergency department patients: A retrospective study.

Implementation of structured radiology reporting and its associated accuracy in comparison to pancreas multi-disciplinary clinic expert radiology review.

Abstract 4119869: Automatic assignment of CAD-RADS categories in coronary CTA reports using large language model

Discrepancy Rates in Acute Abdominal CT: An Audit of In-House and Remote Reporting Compared to Intraoperative Laparoscopic and Laparotomy Findings.

Abstract 4124675: Deep Learning Screening of Cardiac MRIs Uncovers Undiagnosed Hypertrophic Cardiomyopathy in the UK BioBank

Abstract 4139198: A Systematic Approach to Prompting Large Language Models for Automated Feature Extraction from Cardiovascular Imaging Reports

ChatGPT vs Gemini: Comparative Accuracy and Efficiency in CAD-RADS Score Assignment from Radiology Reports.

NIMG-63. LEVERAGING LLMS FOR ACCURATE DIFFERENTIATION OF RADIATION NECROSIS AND TUMOR PROGRESSION IN BRAIN MRI REPORTS: A STUDY ON AUTOMATED SCORING AND CLINICAL IMPLICATIONS

ChatGPT and radiology report: potential applications and limitations.

Collaboration between clinicians and vision-language models in radiology report generation.

Accuracy of Outsourced Radiology Reports in Emergency Surgical Care: Do They Provide a High-Quality, Cost-Effective Service?

Clinical Characteristics and Outcomes of Hospitalized AECOPDs Secondary to SARS-CoV-2 versus Other Respiratory Viruses.

Precise Image-level Localization of Intracranial Hemorrhage on Head CT Scans with Deep Learning Models Trained on Study-level Labels.

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Radiology Reports Research Articles

Related Topics

Articles published on Radiology Reports

Extraction of clinical data on major pulmonary diseases from unstructured radiologic reports using a large language model.

Magnetic Resonance Imaging Template to Standardize Reporting of Evacuation Disorders

Auto-Rad: End-to-End Report Generation from Lumber Spine MRI Using Vision–Language Model

Prevalence and clinical impact of radiographic sclerotic lines adjacent to cementless tibial stems in revision total knee arthroplasty: a long-term follow-up study.

Centralized Investigator Review of Radiological and Functional Imaging Reports in Real-World Oncology Studies: The SACHA-France Experience.

The impact of different radiology report formats on patient information processing: a systematic review.

EDI&amp;S SO04 Accuracy of Outsourced Radiology Reports in Emergency Surgical Care; Do they provide a high-quality cost-effective service?

Documentation of incidentally noted hepatic steatosis to emergency department patients: A retrospective study.

Implementation of structured radiology reporting and its associated accuracy in comparison to pancreas multi-disciplinary clinic expert radiology review.

Abstract 4119869: Automatic assignment of CAD-RADS categories in coronary CTA reports using large language model

Discrepancy Rates in Acute Abdominal CT: An Audit of In-House and Remote Reporting Compared to Intraoperative Laparoscopic and Laparotomy Findings.

Abstract 4124675: Deep Learning Screening of Cardiac MRIs Uncovers Undiagnosed Hypertrophic Cardiomyopathy in the UK BioBank

Abstract 4139198: A Systematic Approach to Prompting Large Language Models for Automated Feature Extraction from Cardiovascular Imaging Reports

ChatGPT vs Gemini: Comparative Accuracy and Efficiency in CAD-RADS Score Assignment from Radiology Reports.

NIMG-63. LEVERAGING LLMS FOR ACCURATE DIFFERENTIATION OF RADIATION NECROSIS AND TUMOR PROGRESSION IN BRAIN MRI REPORTS: A STUDY ON AUTOMATED SCORING AND CLINICAL IMPLICATIONS

ChatGPT and radiology report: potential applications and limitations.

Collaboration between clinicians and vision-language models in radiology report generation.

Accuracy of Outsourced Radiology Reports in Emergency Surgical Care: Do They Provide a High-Quality, Cost-Effective Service?

Clinical Characteristics and Outcomes of Hospitalized AECOPDs Secondary to SARS-CoV-2 versus Other Respiratory Viruses.

Precise Image-level Localization of Intracranial Hemorrhage on Head CT Scans with Deep Learning Models Trained on Study-level Labels.

EDI&S SO04 Accuracy of Outsourced Radiology Reports in Emergency Surgical Care; Do they provide a high-quality cost-effective service?