Reliability of MRI radiomics features in MR-guided radiotherapy for prostate cancer: Repeatability, reproducibility, and within-subject agreement.

Cindy Xue,Yin Kin Cheung,Jing Yuan,Yihang Zhou,Darren Mc Poon,Bin Yang,Siu Ki Yu

doi:10.1002/mp.15232

Abstract

The MR-guided radiotherapy (MRgRT) images on the integrated MRI and linear accelerator (MR-LINAC) might facilitate radiomics analysis for longitudinal treatment response assessment. However, the reliability of MRgRT radiomics features is largely unknown. This study aims to investigate MRgRT radiomics feature reliability acquired using a standardized 3D-T2W-TSE sequence in terms of repeatability, reproducibility, and within-subject feature agreement on a 1.5T MR-simulator and a 1.5T MR-LINAC for prostate cancer (PC). Twenty-six consecutive PC patients who underwent one MRI-simulator scan and two MR-LINAC scans before dose delivery were retrospectively included. The three MRI datasets were rigidly co-registered. 1023 first-order and texture radiomics features were extracted with different intensity bin widths for each scan in the manually segmented clinical target volume (CTV) and planning target volume (PTV) by an experienced radiation oncologist. Intraclass correlation coefficient (ICC) was used to evaluate feature repeatability between MR-LINAC scans and reproducibility between MRI-simulator and MR-LINAC scans. The within-subject feature value agreements were evaluated using Bland-Altman analysis. The impact of inter-observer segmentation on the radiomics feature reliability was also examined based on the second manual segmentation of CTV and PTV by an MRI researcher. Based on the segmentation by the radiation oncologist and the default bin width of 25, 9.6%, 24.1%, 49.6%, and 16.8% of the total 1023 featuresexhibited excellent (ICC>0.9), good (0.9>ICC>0.75), moderate (0.75>ICC>0.5), and poor (ICC<0.5)repeatability in the CTV, and 9.2%, 26.8%, 50.5%, and 13.5% in the PTV, respectively.For reproducibility, the corresponding feature percentages were 8.9%, 19.7%, 41.9%, and 29.6% in the CTV, and 8.4%, 17.8%, 47.9%, and 26% in the PTV. Feature reliability was not notably influenced by intensity bin width for discretization. BA analysis revealed wide 95% limit-of-agreements and substantial biases of feature values between CTV and PTV and between any two MRI scans. The features even with excellent ICC were still subjected to considerable inter-scan feature variations in each individual subject. The analysis on the second segmentation by the MRI researcher showed insignificantly different feature repeatability and reproducibility in terms of ICC values. Only a small proportion of features exhibited excellent/good repeatability and reproducibility, highlighting the importance of reliable MRgRT feature selection. The within-subject feature values were subjected to considerable inter-scan variations, imposing a challenge on the determination of the smallest detectable change in future MRgRT delta-radiomics studies.

Full Text