Purpose: to develop digital phantoms for characterizing inconsistencies among radiomics extraction methods based on three radiomics toolboxes: CERR (Computational Environment for Radiological Research), IBEX (imaging biomarker explorer), and an in-house radiomics platform. Materials and Methods: we developed a series of digital bar phantoms for characterizing intensity and texture features and a series of heteromorphic sphere phantoms for characterizing shape features. The bar phantoms consisted of n equal-width bars (n = 2, 4, 8, or 64). The voxel values of the bars were evenly distributed between 1 and 64. Starting from a perfect sphere, the heteromorphic sphere phantoms were constructed by stochastically attaching smaller spheres to the phantom surface over 5500 iterations. We compared 61 features typically extracted from three radiomics toolboxes: (1) CERR (2) IBEX (3) in-house toolbox. The degree of inconsistency was quantified by concordance correlation coefficient (CCC) and Pearson correlation coefficient (PCC). Sources of discrepancies were characterized based on differences in mathematical definition, pre-processing, and calculation methods. Results: For the intensity and texture features, only 53%, 45%, 55% features demonstrated perfect reproducibility (CCC = 1) between in-house/CERR, in-house/IBEX, and CERR/IBEX comparisons, while 71%, 61%, 61% features reached CCC > 0.8 and 25%, 39%, 39% features were with CCC < 0.5, respectively. Meanwhile, most features demonstrated PCC > 0.95. For shape features, the toolboxes produced similar (CCC > 0.98) volume yet inconsistent surface area, leading to inconsistencies in other shape features. However, all toolboxes resulted in PCC > 0.8 for all shape features except for compactness 1, where inconsistent mathematical definitions were observed. Discrepancies were characterized in pre-processing and calculation implementations from both type of phantoms. Conclusions: Inconsistencies among radiomics extraction toolboxes can be accurately identified using the developed digital phantoms. The inconsistencies demonstrate the significance of implementing quality assurance (QA) of radiomics extraction for reproducible and generalizable radiomic studies. Digital phantoms are therefore very useful tools for QA.
Read full abstract