Managing the effect of magnetic resonance imaging pulse sequence on radiomic feature reproducibility in the study of brain metastases

Drew Mitchell,Brandon Curl,Caroline Chung,Ho-Ling Liu,Sherise D Ferguson,Dima Suki,Lily Erickson,Samantha Buszek,Benjamin Tran,Suprateek Kundu,Jodi Goldman,Maguy Farhat

doi:10.12688/f1000research.122871.1

Drew Mitchell, Brandon Curl + Show 10 more

Open Access

https://doi.org/10.12688/f1000research.122871.1

Copy DOI

Abstract

Background: Despite the promise of radiomics studies, their limited reproducibility has hindered meaningful clinical translation. Variability in study designs as well as image acquisition and processing contribute to unreproducible radiomic results. This work’s purpose was to (i) quantitatively compare variability of radiomic features extracted from 2-D spin echo (SE) and 3-D spoiled gradient echo (SPGR) T1-weighted post-contrast magnetic resonance (MR) images of brain metastases acquired within the same patient in a single imaging session, and (ii) provide a framework to inform data acquisition for reproducible radiomics studies. Methods: A retrospective cohort of 29 patients with pathologically-confirmed brain metastases and contrast-enhanced T1-weighted MR images acquired using 2-D SE and 3-D SPGR sequences within one exam was identified. Metastases were segmented twice by different physicians using semi-automated methods. Radiomic features were extracted using PyRadiomics for 264 preprocessing variable combinations. Lin’s concordance correlation coefficient (CCC) was computed between features extracted from images acquired by both pulse sequences and different tumor segmentations. Results: We provided general recommendations to improve MR-based radiomic feature reproducibility by clustering and identifying low-concordance features and processing variables. Median CCC between 2-D SE and 3-D SPGR (measuring feature agreement between pulse sequences) was greater for fixed bin count intensity discretization (0.76 versus 0.63) and specific high-concordance features (0.74 versus 0.53). Applying all recommendations improved median CCC from 0.51 to 0.79. Median CCC between contours (measuring feature sensitivity to inter-observer variability) was higher for 2-D SE (0.93 versus 0.86) but improved to 0.93 for 3-D SPGR after low-concordance feature exclusion. Conclusions: The following recommendations are proposed to improve reproducibility: 1) Fixed bin count intensity discretization for all studies, 2) for studies with 2-D and 3-D datasets, excluding high-variability features from downstream analyses, 3) when segmentation is manual or semi-automated, using only 2-D SE images or excluding features susceptible to segmentation variability.

Full Text