Site‐specific investigations of the role of radiomics in cancer diagnosis and therapy are emerging. We evaluated the reproducibility of radiomic features extracted from 18Flourine–fluorodeoxyglucose (18F‐FDG) PET images for three parameters: manual versus computer‐aided segmentation methods, gray‐level discretization, and PET image reconstruction algorithms. Our cohort consisted of pretreatment PET/CT scans from 88 cervical cancer patients. Two board‐certified radiation oncologists manually segmented the metabolic tumor volume (MTV1 and MTV2) for each patient. For comparison, we used a graphical‐based method to generate semiautomated segmented volumes (GBSV). To address any perturbations in radiomic feature values, we down‐sampled the tumor volumes into three gray‐levels: 32, 64, and 128 from the original gray‐level of 256. Finally, we analyzed the effect on radiomic features on PET images of eight patients due to four PET 3D‐reconstruction algorithms: maximum likelihood‐ordered subset expectation maximization (OSEM) iterative reconstruction (IR) method, fourier rebinning‐ML‐OSEM (FOREIR), FORE‐filtered back projection (FOREFBP), and 3D‐Reprojection (3DRP) analytical method. We extracted 79 features from all segmentation method, gray‐levels of down‐sampled volumes, and PET reconstruction algorithms. The features were extracted using gray‐level co‐occurrence matrices (GLCM), gray‐level size zone matrices (GLSZM), gray‐level run‐length matrices (GLRLM), neighborhood gray‐tone difference matrices (NGTDM), shape‐based features (SF), and intensity histogram features (IHF). We computed the Dice coefficient between each MTV and GBSV to measure segmentation accuracy. Coefficient values close to one indicate high agreement, and values close to zero indicate low agreement. We evaluated the effect on radiomic features by calculating the mean percentage differences (d¯) between feature values measured from each pair of parameter elements (i.e. segmentation methods: MTV1‐MTV2, MTV1‐GBSV, MTV2‐GBSV; gray‐levels: 64‐32, 64‐128, and 64‐256; reconstruction algorithms: OSEM‐FORE‐OSEM, OSEM‐FOREFBP, and OSEM‐3DRP). We used |d¯| as a measure of radiomic feature reproducibility level, where any feature scored |d¯| ±SD ≤ |25|% ± 35% was considered reproducible. We used Bland–Altman analysis to evaluate the mean, standard deviation (SD), and upper/lower reproducibility limits (U/LRL) for radiomic features in response to variation in each testing parameter. Furthermore, we proposed U/LRL as a method to classify the level of reproducibility: High— ±1% ≤ U/LRL ≤ ±30%; Intermediate— ±30% < U/LRL ≤ ±45%; Low— ±45 < U/LRL ≤ ±50%. We considered any feature below the low level as nonreproducible (NR). Finally, we calculated the interclass correlation coefficient (ICC) to evaluate the reliability of radiomic feature measurements for each parameter. The segmented volumes of 65 patients (81.3%) scored Dice coefficient >0.75 for all three volumes. The result outcomes revealed a tendency of higher radiomic feature reproducibility among segmentation pair MTV1‐GBSV than MTV2‐GBSV, gray‐level pairs of 64‐32 and 64‐128 than 64‐256, and reconstruction algorithm pairs of OSEM‐FOREIR and OSEM‐FOREFBP than OSEM‐3DRP. Although the choice of cervical tumor segmentation method, gray‐level value, and reconstruction algorithm may affect radiomic features, some features were characterized by high reproducibility through all testing parameters. The number of radiomic features that showed insensitivity to variations in segmentation methods, gray‐level discretization, and reconstruction algorithms was 10 (13%), 4 (5%), and 1 (1%), respectively. These results suggest that a careful analysis of the effects of these parameters is essential prior to any radiomics clinical application.
Read full abstract