The purpose of this study is to investigate the potential of k-means clustering to efficiently reduce the variety of materials needed in Monte Carlo (MC) dose calculation. A numerical phantom with 31 human tissues surrounded by water is created. K-means clustering is used to group the tissues in clusters of constant elemental composition. Four different distance measures are used to perform the clustering technique: Euclidean, Standardized Euclidean, Chi-Squared and Cityblock. Dose distributions are calculated with MC simulations for both low-kV photons and MeV protons using the clustered and reference elemental composition. Comparison between the dose distributions in the clustered and non-clustered phantom are made to assess the impact of clustering with each distance measure. The statistical significance of the differences observed between the four different metrics is determined by comparing the accuracy of energy absorption coefficients (EAC) of low-kV photons and proton stopping powers relative to water (SPR) for repeated clustering procedures. The performance of the proposed approach for a larger number of original materials is evaluated similarly by using a population of 62 000 statistically generated materials grouped into classes defined with supervised and unsupervised classification. In the phantom geometry, the Chi-Squared distance is the one introducing the smallest error on dose distribution and significant differences are observed between the EAC and SPR values predicted by each distance metric. The proposed approach is also shown to be equivalent to a state-of-the-art supervised classification method for proton therapy, but beneficial for low-kV photons applications. In conclusion, k-means clustering successfully reduces the variety of materials needed for accurate MC dose calculation. Based on the performance of four distance measures, we conclude that k-means clustering using the Chi-Squared distance introduces the smallest errors on dose distribution. The method is shown to yield similar or improved accuracy on key physical parameters compared to supervised classification.
Read full abstract