Reproducibility of Head and Neck MRI Radiomic Features Between Two Common Analysis Packages

J Korte,C.E Cardenas,T Kron,N Hardcastle,J Wang,H Bahig,B Elgohari,L.E Court,C.D Fuller,S.P Ng

doi:10.1016/j.ijrobp.2020.07.220

Abstract

Radiomics analysis generates hundreds of image-based features, making feature reduction a crucial step to prevent overfitting when developing a radiomics model. Here we investigate the reproducibility of features of head and neck MRI using IBEX and PyRadiomics. Radiomic features were calculated on 312 ADC maps from the PREDICT-HN prospective trial in which 58 head and neck cancer patients were imaged prior to and throughout radiotherapy. Diffusion weighted images were acquired on a Siemens 1.5T Aera with the BLADE. The gross tumor volume was contoured on T2 weighted turbo-spin-echo (T2w-TSE) images. The contours were propagated onto the ADC maps and visually verified. To determine the relationship between features generated with IBEX and PyRadiomics linear regression analysis was performed on ADC map. A sub-set of reproducible features with high Pearson correlation coefficient (r > = 0.9) was identified. Unsupervised learning was utilized to show the potential impact of incorporating non-reproducible features in a radiomics model. Separate radiomic models were generated based on PyRadiomics and IBEX features, first using all features and then with the sub-set of reproducible features. Common features (n = 47) of the open-source software packages (IBEX, PyRadiomics) were identified based on name and calculated with feature extraction settings matched as closely as possible using available documentation. Intensity histogram (IHIST), shape, Grey-level co-occurrence matrix (GLCM), grey-level run length matrix (GLRLM) and neighborhood grey-tone difference matrix (NGTDM) features were calculated on the original ADC map only. Intensity histogram and shape features correlated highly between IBEX and PyRadiomics. Higher order features (GLCM, GLRLM and NGTDM) were less correlative. Reliable features from intensity histogram (5/7), shape (5/8), GLCM (neighborhood 1:3/16, 4:4/16, 7:0/16), GLRLM (0/11) and NGTDM (2/5) categories were identified. Clustering based on all features generated very different patient groups from IBEX and PyRadiomics models demonstrating how feature reproducibility issues can negatively affect model reproducibility. IBEX and PyRadiomics models classified patients into identical groups when clustering was based solely on reliable features, suggesting that using a correlation threshold to identify reproducible features may be an adequate method to reduce uncertainty when interpreting radiomic signatures. This work highlights feature and model reproducibility issues due to different radiomic analysis software. A correlation threshold method to select reproducible features is needed to show that the identified features from both softwares generate an equivalent model.

Full Text