Before implementing radiomics in routine clinical practice, comprehensive knowledge about the repeatability and reproducibility of radiomic features is required. The aim of this study was to systematically investigate the influence of image processing parameters on radiomic features from magnetic resonance imaging (MRI) in terms of feature values as well as test-retest repeatability. Utilizing a phantom consisting of 4 onions, 4 limes, 4 kiwifruits, and 4 apples, we acquired a test-retest dataset featuring 3 of the most commonly used MRI sequences on a 3 T scanner, namely, a T1-weighted, a T2-weighted, and a fluid-attenuated inversion recovery sequence, each at high and low resolution. After semiautomatic image segmentation, image processing with systematic variation of image processing parameters was performed, including spatial resampling, intensity discretization, and intensity rescaling. For each respective image processing setting, a total of 45 radiomic features were extracted, corresponding to the following 7 matrices/feature classes: conventional indices, histogram matrix, shape matrix, gray-level zone length matrix, gray-level run length matrix, neighboring gray-level dependence matrix, and gray-level cooccurrence matrix. Systematic differences of individual features between different resampling steps were assessed using 1-way analysis of variance with Tukey-type post hoc comparisons to adjust for multiple testing. Test-retest repeatability of radiomic features was measured using the concordance correlation coefficient, dynamic range, and intraclass correlation coefficient. Image processing influenced radiological feature values. Regardless of the acquired sequence and feature class, significant differences ( P < 0.05) in feature values were found when the size of the resampled voxels was too large, that is, bigger than 3 mm. Almost all higher-order features depended strongly on intensity discretization. The effects of intensity rescaling were negligible except for some features derived from T1-weighted sequences. For all sequences, the percentage of repeatable features (concordance correlation coefficient and dynamic range ≥ 0.9) varied considerably depending on the image processing settings. The optimal image processing setting to achieve the highest percentage of stable features varied per sequence. Irrespective of image processing, the fluid-attenuated inversion recovery sequence in high-resolution overall yielded the highest number of stable features in comparison with the other sequences (89% vs 64%-78% for the respective optimal image processing settings). Across all sequences, the most repeatable features were generally obtained for a spatial resampling close to the originally acquired voxel size and an intensity discretization to at least 32 bins. Variation of image processing parameters has a significant impact on the values of radiomic features as well as their repeatability. Furthermore, the optimal image processing parameters differ for each MRI sequence. Therefore, it is recommended that these processing parameters be determined in corresponding test-retest scans before clinical application. Extensive repeatability, reproducibility, and validation studies as well as standardization are required before quantitative image analysis and radiomics can be reliably translated into routine clinical care.
Read full abstract