Background: Radiomics features have been used in a variety of studies to predict patient outcomes or aid in the diagnosis of non-small cell lung cancer. However, no guidelines exist for the best way to calculate these features to maximize their prognostic potential. The purpose of the current study was to evaluate how different image pre-processing techniques may impact both the volume dependence and prognostic potential of the features in univariate analyses. Methods: Radiomics features from the histogram, co-occurrence matrix, neighborhood gray-tone difference matrix and run-length matrix were calculated from a set of computed tomography (CT) images of 107 non-small cell lung cancer tumors with volumes ranging from 5 to 567 cm3. Features were calculated from the images with no pre-processing, 8 bit depth resampling, Butterworth smoothing, or both 8 bit depth resampling and Butterworth smoothing. To determine which features were correlated with volume, we calculated the Spearman rank correlation coefficient (rs) for each feature and preprocessing combination. For features that had very high volume correlations (rs >0.95) regardless of which preprocessing algorithm was used, we normalized the algorithm for volume and recalculated the volume correlation. To determine whether the preprocessing technique affected the usefulness of the feature, we fitted univariate Cox proportional hazards models for all four preprocessing techniques for each feature and calculated the P value. Additionally, univariate cox models were recalculated using leave-one-out cross validation to generate risk predictions for each patient. As a result, each patient had a predicted outcome for each model in which the patient was not involved in the model building. The prediction accuracy was assessed using Harrell’s concordance index (c-index). Finally, the ability of each feature to improve model fit was examined using the P value of the log-likelihood ratio between a model built using volume only and a model built using volume and one radiomics feature. The Benjamini-Hochberg procedure was used for multiplicity correction. Results: Five features were entirely volume dependent (busyness, coarseness, grey-level non-uniformity, run-length non-uniformity, and energy) and new algorithms were proposed for these features. Both the correlation with volume and the prognostic value of individual features changed substantially with different preprocessing techniques. In general, preprocessed features that were at least slightly correlated with volume (rs >0.5) were more likely to be significant in the univariate analysis. Additionally, Butterworth smoothing, used either alone or in conjunction with 8 bit depth resampling, most often yielded features that were significant in univariate analysis. Conclusions: Preprocessing can have a strong impact on the volume dependence of a feature, and its significance in univariate models. To create standardized features useful for multivariate modeling, it will be important to balance the usefulness of features with their volume dependence.
Read full abstract