Bone ranks as the third most frequent tissue affected by cancer metastases, following the lung and liver. Bone metastases are often painful and may result in pathological fracture, which is a major cause of morbidity and mortality in cancer patients. To quantify fracture risk, finite element (FE) analysis has shown to be a promising tool, but metastatic lesions are typically not specifically segmented and therefore their mechanical properties may not be represented adequately. Deep learning methods potentially provide the opportunity to automatically segment these lesions and change the mechanical properties more adequately. In this study, our primary focus was to gain insight into the performance of an automatic segmentation algorithm for femoral metastatic lesions using deep learning methods and the subsequent effects on FE outcomes. The aims were to determine the similarity between manual segmentation and automatic segmentation; the differences in predicted failure load between FE models with automatically segmented osteolytic and mixed lesions and the models with CT-based lesion values (the gold standard); and the effect on the BOne Strength (BOS) score (failure load adjusted for body weight) and subsequent fracture risk assessments.From two patient cohorts, a total number of 50 femurs with osteolytic and mixed metastatic lesions were included in this study. The femurs were segmented from CT images and transferred into FE meshes. The material behavior was implemented as non-linear isotropic. These FE models were considered as gold standard (Finite Element no Segmented Lesion: FE-no-SL), whereby the local calcium equivalent density of both femur and metastatic lesion was extracted from CT-values. Lesions in the femur were manually segmented by two biomechanical experts after which final lesion segmentation for each femur was obtained based on consensus of opinions between two observers. Subsequently, a self-configuring variant of the popular deep learning model U-Net known as nnU-Net was used to automatically segment metastatic lesions within the femur. For these models with segmented lesions (Finite Element with Segmented Lesion: FE-with-SL), the calcium equivalent density within the metastatic lesions was set to zero after being segmented by the neural network, simulating absence of load-bearing capacity of these lesions. The models (either with or without automatically segmented lesions) were loaded incrementally in axial direction until failure was simulated. Dice coefficient was used to evaluate the similarity of the manual and automatic segmentation. Mean calcium equivalent density values within the automatically segmented lesions were calculated. Failure loads and patterns were determined. Furthermore, sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) were calculated for both groups by comparing the predictions to the occurrence or absence of actual fracture within the patient cohorts.The automatic segmentation algorithm performed in a none-robust manner. Dice coefficients describing the similarity between consented manual and automatic segmentations were relatively low (mean 0.45 ± standard deviation 0.33, median 0.54). Failure load difference between the FE-no-SL and FE-with-SL groups varied from 0 % to 48 % (mean 6.6 %). Correlation analysis of failure loads between the two groups showed a strong relationship (R2 > 0.9). From the 50 cases, four cases showed clear deviations for which models with automatic lesion segmentation (FE-with-SL) showed considerably lower failure loads. In the whole database including osteolytic and mixed lesions, sensitivity and NPV remained the same, but specificity and PPV decreased from 94 % to 83 %, and from 78 % to 54 % respectively from FE-no-SL to FE-with-SL.This study indicates that the nnU-Net yielded none-robust outcomes in femoral lesion segmentation and that other segmentation algorithms should be considered. However, the difference in failure pattern and failure load between FE models with automatically segmented osteolytic and mixed lesions were relatively small in most cases with a few exceptions. On the other hand, the accuracy of fracture risk assessment using the BOS score was lower compared to the FE-no-SL. In conclusion, this study showed that automatic lesion segmentation is a none-solved issue and therefore, quantifying lesion characteristics and the subsequent effect on the fracture risk using deep learning will remain challenging.
Read full abstract