Human Readers Research Articles

Background: The effectiveness of different large language models (LLM) in extracting important information from CT angiography (CTA) radiology reports is unknown. Purpose: The Coronary Artery Disease-Reporting and Data System (CAD-RADS) is a standardized reporting system for CTA that categorizes the severity of coronary artery disease to aid in clinical decision-making. To evaluate the agreement between human readers and LLMs in assigning CAD-RADS categories based on CTA reports. Materials and Methods: This retrospective study analyzed reports from patients who underwent coronary CTA for screening or diagnosis at Johns Hopkins Hospital. Reports with CAD-RADS 0–5 findings were collected from Jan. 2020 to Oct. 2023. Initially, 6,212 CTA reports were retrieved. After excluding non-organized reports and retaining only those with CAD-RADS scores, 590 reports remained. Board-certified radiologists and three large language models (GPT-3.5, GPT-4, and Llama3) assigned CAD-RADS categories based on the original radiologists' findings. Two radiologists reviewed the CAD scores, with only one discrepancy in the 590 reports, indicating high inter-agreement. Agreement between human readers and LLMs for CAD-RADS categories was assessed using the Gwet agreement coefficient (AC1 value). Frequencies of changes in CAD-RADS category assignments that could potentially affect clinical management (CAD-RADS 0-2 vs. 3-5) were calculated and compared. Typically, CAD-RADS 0-2 indicates mild disease leading to conservative management, while CAD-RADS 3-5 denotes further testing or disease requiring aggressive intervention to prevent adverse cardiovascular events. Results: Among 590 reports, agreement between original and reviewing radiologists was almost perfect (AC1 = 0.985). Good agreement was also found between original radiologists and Llama3, GPT-3.5, and GPT-4 (AC1 = 0.861, 0.907, 0.941, respectively). Differences in CAD-RADS category upgrades/downgrades potentially affecting clinical management were observed: 0 of 590 (0%) for human readers, 56 of 590 (9.4%) for Llama3, 33 of 590 (5.5%) for GPT-3.5, and 23 of 590 (3.9%) for GPT-4 upgrading CAD-RADs; 8 of 590 (1.4%) for human readers, 19 of 590 (3.2%) for Llama3, 22 of 590 (3.7%) for GPT-3.5, and 12 of 590 (2.0%) for GPT-4 downgrading RAD-CADs. Conclusion: LLMs showed good agreement with human readers in assigning CAD-RADS categories on CTA reports. However, they were more likely to upgrade CAD-RADS category when compared to radiologists.

Read full abstract

Abstract Introduction In cardiac computed tomography (CT) examinations, non-contrast scans are often performed in addition to contrast-enhanced scans to quantify the coronary artery calcification (CAC) score. These calcium score scans are associated with additional exposure to ionizing radiation. Purpose We sought to develop a fully automated artificial intelligence (AI)-based algorithm capable of determining CAC score based solely on contrast-enhanced CT scans, eliminating the need for additional non-contrast scans. Methods An automated CAC scoring algorithm on contrast-enhanced CT scans was developed and trained using a dataset of 297 cardiac CT studies. Coronary artery calcifications were manually segmented in contrast-enhanced scans using a threshold of 2 standard deviations above the mean attenuation value of the ascending aorta. On non-contrast scans, CAC was manually assessed using a standard threshold of ≥130 Hounsfield Units. A correction factor for calcium score results assessed on contrast-enhanced scans was determined using linear regression. The algorithm was tested on an independent set of 90 contrast-enhanced CT scans (four manufacturers, eight scanner models). We compared automated CAC scores to manual expert reader reference assessments on non-contrast CT scans. CAC scores were categorized into four risk categories following the Society of Cardiovascular Computed Tomography recommendations: 0, 1-100, 101-300, and &gt;300 Agatston Units. Results In the assessment of 90 CT studies (mean age 61.0±11.4 years, 46.7% males), the AI-model detected CAC in 69 contrast-enhanced scans (76.7%), comparable to the human reader's detection rate of CAC in 71 non-contrast scans (78.9%) (p = 0.63). The CAC score was initially calculated using AI-based segmentation of coronary calcification on contrast-enhanced scans and then the result was multiplied by established linear correction factor of 1.97. There was an excellent correlation between AI-model and manual reference total CAC scores (Pearson’s r = 0.96, 95% CI 0.94–0.97, p &lt; 0.001). The model correctly classified 77 patients (85.6%) into the same CAC risk category as the human reader (Figure 1). Among 19 patients (21.1%) with a CAC score of zero, only 1 patient (5.3%) was reclassified with a non-zero CAC score by the AI-model. Cohen’s kappa value for CAC score risk categorization was 0.80 (p &lt; 0.001), indicating very good agreement (Figure 2). Bland–Altman analysis revealed minimal bias of -9.7 Agatston Unit with 95% limits of agreement ranging from -184.8 to 165.5 Agatston Unit. Conclusions CAC score can be accurately quantified on contrast-enhanced cardiac CT scans using an automated AI-based algorithm. This approach has the potential to eliminate the necessity for an additional non-contrast CT scan, thereby reducing the patient's exposure to ionizing radiation.

Read full abstract

Human Readers Research Articles

Related Topics

Articles published on Human Readers

STEED: A data mining tool for automated extraction of experimental parameters and risk of bias items from in vivo publications.

Utilizing a domain-specific large language model for LI-RADS v2018 categorization of free-text MRI reports: a feasibility study

Abstract 4119869: Automatic assignment of CAD-RADS categories in coronary CTA reports using large language model

Abstract 4144553: Fully Automated Machine Learning Based Echocardiographic Assessment of Global Longitudinal Strain in Breast Cancer Patients Receiving Cardiotoxic Chemotherapy

NIMG-63. LEVERAGING LLMS FOR ACCURATE DIFFERENTIATION OF RADIATION NECROSIS AND TUMOR PROGRESSION IN BRAIN MRI REPORTS: A STUDY ON AUTOMATED SCORING AND CLINICAL IMPLICATIONS

A critical comparative study of the performance of three AI-assisted programs for bone age determination.

Deep Learning Algorithms for Breast Cancer Detection in a UK Screening Cohort: As Stand-alone Readers and Combined with Human Readers.

Diagnostic performance of an artificial intelligence model for the detection of pneumothorax at chest X-ray

Toward Foundation Models in Radiology? Quantitative Assessment of GPT-4V's Multimodal and Multianatomic Region Capabilities.

"Artificial intelligence Reading Digital Mammogram: Enhancing Detection and Differentiation of Suspicious Microcalcifications".

AI-based algorithm for assessing coronary artery calcium score on contrast-enhanced cardiac computed tomography scans

Validation of Inter-Reader Agreement/Consistency for Quantification of Ellipsoid Zone Integrity and Sub-RPE Compartmental Features Across Retinal Diseases.

Post-deployment performance of a deep learning algorithm for normal and abnormal chest X-ray classification: A study at visa screening centers in the United Arab Emirates

“Let faith oust fact; let fancy oust memory”: Melville, Media and Narratives

Automatic structuring of radiology reports with on-premise open-source large language models.

Coronary artery disease detection using deep learning and ultrahigh-resolution photon-counting coronary CT angiography

Assessment of the Breast Density Prevalence in Swiss Women with a Deep Convolutional Neural Network: A Cross-Sectional Study

Galileo-an Artificial Intelligence tool for evaluating pre-implantation kidney biopsies.

Evaluation of an AI-Driven Deep Learning Model in Predicting Mammographic Breast Density

Deep learning-based detection and classification of intracranial tumors on magnetic resonance imaging

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Human Readers Research Articles

Related Topics

Articles published on Human Readers

STEED: A data mining tool for automated extraction of experimental parameters and risk of bias items from in vivo publications.

Utilizing a domain-specific large language model for LI-RADS v2018 categorization of free-text MRI reports: a feasibility study

Abstract 4119869: Automatic assignment of CAD-RADS categories in coronary CTA reports using large language model

Abstract 4144553: Fully Automated Machine Learning Based Echocardiographic Assessment of Global Longitudinal Strain in Breast Cancer Patients Receiving Cardiotoxic Chemotherapy

NIMG-63. LEVERAGING LLMS FOR ACCURATE DIFFERENTIATION OF RADIATION NECROSIS AND TUMOR PROGRESSION IN BRAIN MRI REPORTS: A STUDY ON AUTOMATED SCORING AND CLINICAL IMPLICATIONS

A critical comparative study of the performance of three AI-assisted programs for bone age determination.

Deep Learning Algorithms for Breast Cancer Detection in a UK Screening Cohort: As Stand-alone Readers and Combined with Human Readers.

Diagnostic performance of an artificial intelligence model for the detection of pneumothorax at chest X-ray

Toward Foundation Models in Radiology? Quantitative Assessment of GPT-4V's Multimodal and Multianatomic Region Capabilities.

"Artificial intelligence Reading Digital Mammogram: Enhancing Detection and Differentiation of Suspicious Microcalcifications".

AI-based algorithm for assessing coronary artery calcium score on contrast-enhanced cardiac computed tomography scans

Validation of Inter-Reader Agreement/Consistency for Quantification of Ellipsoid Zone Integrity and Sub-RPE Compartmental Features Across Retinal Diseases.

Post-deployment performance of a deep learning algorithm for normal and abnormal chest X-ray classification: A study at visa screening centers in the United Arab Emirates

“Let faith oust fact; let fancy oust memory”: Melville, Media and Narratives

Automatic structuring of radiology reports with on-premise open-source large language models.

Coronary artery disease detection using deep learning and ultrahigh-resolution photon-counting coronary CT angiography

Assessment of the Breast Density Prevalence in Swiss Women with a Deep Convolutional Neural Network: A Cross-Sectional Study

Galileo-an Artificial Intelligence tool for evaluating pre-implantation kidney biopsies.

Evaluation of an AI-Driven Deep Learning Model in Predicting Mammographic Breast Density

Deep learning-based detection and classification of intracranial tumors on magnetic resonance imaging