Inter-reader Variability Research Articles

Background CT deep learning image reconstruction (DLIR) improves image quality by reducing noise compared with adaptive statistical iterative reconstruction-V (ASIR-V). However, objective assessment of low-contrast lesion detectability is lacking. Purpose To investigate low-contrast detectability of hypoattenuating liver lesions on CT scans reconstructed with DLIR compared with CT scans reconstructed with ASIR-V in a patient and a phantom study. Materials and Methods This single-center retrospective study included patients undergoing portal venous phase abdominal CT between February and May 2021 and a low-contrast-resolution phantom scanned with the same protocol. Four reconstructions (ASIR-V at 40% strength [ASIR-V 40] and DLIR at three strengths) were generated. Five radiologists qualitatively assessed the images using the five-point Likert scale for image quality, lesion diagnostic confidence, conspicuity, and small lesion (≤1 cm) visibility. Up to two key lesions per patient, confirmed at histopathologic testing or at prior or follow-up imaging studies, were included. Lesion-to-background contrast-to-noise ratio was calculated. Interreader variability was analyzed. Intergroup qualitative and quantitative metrics were compared between DLIR and ASIR-V 40 using proportional odds logistic regression models. Results Eighty-six liver lesions (mean size, 15 mm ± 9.5 [SD]) in 50 patients (median age, 62 years [IQR, 57-73 years]; 27 [54%] female patients) were included. Differences were not detected for various qualitative low-contrast detectability metrics between ASIR-V 40 and DLIR (P > .05). Quantitatively, medium-strength DLIR and high-strength DLIR yielded higher lesion-to-background contrast-to-noise ratios than ASIR-V 40 (medium-strength DLIR vs ASIR-V 40: odds ratio [OR], 1.96 [95% CI: 1.65, 2.33]; high-strength DLIR vs ASIR-V 40: OR, 5.36 [95% CI: 3.68, 7.82]; P < .001). Low-contrast lesion attenuation was reduced by 2.8-3.6 HU with DLIR. Interreader agreement was moderate to very good for the qualitative metrics. Subgroup analysis based on lesion size of larger than 1 cm and 1 cm or smaller yielded similar results (P > .05). Qualitatively, phantom study results were similar to those in patients (P > .05). Conclusion The detectability of low-contrast liver lesions was similar on CT scans reconstructed with low-, medium-, and high-strength DLIR and ASIR-V 40 in both patient and phantom studies. Lesion-to-background contrast-to-noise ratios were higher for DLIR medium- and high-strength reconstructions compared with ASIR-V 40. © RSNA, 2024 Supplemental material is available for this article.

Read full abstract

e23010 Background: The FDA recommends Blinded Independent Central Review (BICR) with double read for clinical trials with imaging. However, inter-reader variability is a concern in these trials. While studies have investigated the variability of RECIST, the primary response criteria, little attention has been given to the optimal association of readers. The evaluation of therapeutic response in phase III trials relies on the Date of first Progressive Disease (DoPD), with the Discrepancy Rate (DR) as the preferred index for measuring inter-reader variability in DoPD evaluation. Another important index measures readers' bias, assessing their tendency to over or under-estimate diagnoses. In cases of discrepancies, a third reader is brought in for adjudication. However, the impact of adjudication on trials' Progression-Free Survival (PFS) is not well-documented. Our study examines the variability in a lung clinical trial using RECIST, analyzing double reading performance, reader association prediction, and the impact of adjudication on PFS estimations. Methods: We retrospectively analyzed five phase III lung clinical trials using the RECIST 1.1 criteria in BICR with double reads. The trials involved 7 readers organized into 11 teams, each reader having participated in multiple trials and performed over 50 reads, resulting in 1017 patients' reviews. Our analysis included: Calculation of DR and bias for each team, and testing the correlation between DR and bias. Computation of the signed bias for each individual reader. Evaluation of a probabilistic model to predict the DR for each team and the bias for each reader. Comparison of PFS between single and double reads after adjudication and endorsement of one of the readings. Results: A multiple comparisons test did not reveal any difference between teams’ DR (Marascuilo test; q > 0.05). The average DR across all teams was 39.9% [95%CI; 37.8; 41.9]. However, we did find significant differences in bias when comparing 9/55 pairs of teams (Marascuilo test; q < 0.05). The range of absolute bias values was 20% to 100%. We did not find a correlation between bias and DR (p = 0.64). Additionally, when comparing the average bias value per reader, no differences were observed (Marascuilo test; q < 0.05). We failed to predict teams DR and readers' bias. The endorsement rate of readers ranged [18%; 82%]. After adjudication, we found that 27% of the PFS were lower than the minimum value obtained from the single readers, in one case 20.6% shorter. Conclusions: Significant readers' bias has a notable impact on double readings, independent of the DR values. The performance of one reader cannot be generalized based on others. Additionally, adjudication significantly affects the PFS of clinical trials. These findings emphasize the importance of considering readers' bias and the potential consequences of adjudication in clinical trial assessments.

Read full abstract

Inter-reader Variability Research Articles

Related Topics

Articles published on Inter-reader Variability

BIOM-54. AI-DRIVEN RISK-OF-PROGRESSION (AIRIP) CLASSIFIER FOR DISTINGUISHING RECURRENT BRAIN METASTASES FROM RADIATION TREATMENT EFFECT: A MULTI-INSTITUTIONAL COMPARATIVE STUDY WITH ADVANCED MULTIMODAL IMAGING

Factors of interobserver variability in prostate tumor MRI delineation: impact of PI-QUAL score.

Reproducibility of Cardiac Multifrequency MR Elastography in Assessing Left Ventricular Stiffness and Viscosity.

ProstateZones – Segmentations of the prostatic zones and urethra for the PROSTATEx dataset

Detectability of Hypoattenuating Liver Lesions with Deep Learning CT Reconstruction: A Phantom and Patient Study.

Deep-learning model accurately classifies multi-label lung ultrasound findings, enhancing diagnostic accuracy and inter-reader agreement.

Clinical feasibility of deep learning–accelerated single-shot turbo spin echo sequence with enhanced denoising for pancreas MRI at 3 Tesla

Artificial intelligence-based quantification of pulmonary HRCT (AIqpHRCT) for the evaluation of interstitial lung disease in patients with inflammatory rheumatic diseases

Three-Dimensional Transthoracic Echocardiography for Semiautomated Analysis of the Tricuspid Annulus: Validation and Normal Values

Use of reporting templates for chest radiographs in a coronavirus disease 2019 context: measuring concordance of radiologists with three international templates.

Intraindividual reproducibility of myocardial radiomic features between energy-integrating detector and photon-counting detector CT angiography

Comparison of best landmarks for calculating fetal jaw measurements by ultrasound and MRI in micrognathia.

Interreader and Intermodality Variability in Macular Atrophy Quantification in Neovascular Age-related Macular Degeneration: Comparison of 6 Imaging Modalities

Advances in multiparametric magnetic resonance imaging combined with biomarkers for the diagnosis of high-grade prostate cancer.

Influence of Gadolinium-based Contrast Media and Inter-reader Variation on the Estimation of Intravoxel Incoherent Motion (IVIM) Parameters in Breast MR Imaging.

Diagnostic accuracy and reliability of CT-based Node-RADS for colon cancer.

Double reading performance and the impact of adjudication on progression-free survival estimations: Findings from a lung clinical trial.

Promptable foundation model for automatic whole body RECIST measurement.

Explainable Precision Medicine in Breast MRI: A Combined Radiomics and Deep Learning Approach for the Classification of Contrast Agent Uptake.

Improving assessment of lesions in longitudinal CT scans: a bi-institutional reader study on an AI-assisted registration and volumetric segmentation workflow

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Inter-reader Variability Research Articles

Related Topics

Articles published on Inter-reader Variability

BIOM-54. AI-DRIVEN RISK-OF-PROGRESSION (AIRIP) CLASSIFIER FOR DISTINGUISHING RECURRENT BRAIN METASTASES FROM RADIATION TREATMENT EFFECT: A MULTI-INSTITUTIONAL COMPARATIVE STUDY WITH ADVANCED MULTIMODAL IMAGING

Factors of interobserver variability in prostate tumor MRI delineation: impact of PI-QUAL score.

Reproducibility of Cardiac Multifrequency MR Elastography in Assessing Left Ventricular Stiffness and Viscosity.

ProstateZones – Segmentations of the prostatic zones and urethra for the PROSTATEx dataset

Detectability of Hypoattenuating Liver Lesions with Deep Learning CT Reconstruction: A Phantom and Patient Study.

Deep-learning model accurately classifies multi-label lung ultrasound findings, enhancing diagnostic accuracy and inter-reader agreement.

Clinical feasibility of deep learning–accelerated single-shot turbo spin echo sequence with enhanced denoising for pancreas MRI at 3 Tesla

Artificial intelligence-based quantification of pulmonary HRCT (AIqpHRCT) for the evaluation of interstitial lung disease in patients with inflammatory rheumatic diseases

Three-Dimensional Transthoracic Echocardiography for Semiautomated Analysis of the Tricuspid Annulus: Validation and Normal Values

Use of reporting templates for chest radiographs in a coronavirus disease 2019 context: measuring concordance of radiologists with three international templates.

Intraindividual reproducibility of myocardial radiomic features between energy-integrating detector and photon-counting detector CT angiography

Comparison of best landmarks for calculating fetal jaw measurements by ultrasound and MRI in micrognathia.

Interreader and Intermodality Variability in Macular Atrophy Quantification in Neovascular Age-related Macular Degeneration: Comparison of 6 Imaging Modalities

Advances in multiparametric magnetic resonance imaging combined with biomarkers for the diagnosis of high-grade prostate cancer.

Influence of Gadolinium-based Contrast Media and Inter-reader Variation on the Estimation of Intravoxel Incoherent Motion (IVIM) Parameters in Breast MR Imaging.

Diagnostic accuracy and reliability of CT-based Node-RADS for colon cancer.

Double reading performance and the impact of adjudication on progression-free survival estimations: Findings from a lung clinical trial.

Promptable foundation model for automatic whole body RECIST measurement.

Explainable Precision Medicine in Breast MRI: A Combined Radiomics and Deep Learning Approach for the Classification of Contrast Agent Uptake.

Improving assessment of lesions in longitudinal CT scans: a bi-institutional reader study on an AI-assisted registration and volumetric segmentation workflow