Automated, U-Net-based cartilage segmentations have been reported to provide a high sensitivity to cartilage thickness loss. Still, there is no perfect agreement between U-Net and manual segmentations, in particular in OA knees, and outliers with incorrect segmentations may impair the U-Net performance. To evaluate the performance of automated, U-Net-based segmentations of MRIs with and without additional manual quality control and correction (QC&C) and the time effort required for QC&C. 2D U-Nets were trained separately for the medial (MFTC) and the lateral (LFTC) femorotibial compartment of the knee using manual segmentations from 3D MRI acquired by IMI-APPROACH (training/ validation set: n=100/25). The U-Nets were then applied to 26 knees with accelerated MFTC cartilage thickness loss over 2 years and 16 knees without any cartilage loss (age: 67.2±6.8y, 57% females). Subsequently, U-Net-based segmentations were checked and corrected by 5 experienced readers before additional expert quality control. The QC&C required approximately 60% of the time of fully manual segmentations and was reported to be more tedious than performing segmentations from scratch. The agreement (DSC=Dice Similarity Coefficient) of automated vs. manual segmentations before QC&C ranged from 0.83±0.12 to 0.89±0.05 for the 4 femorotibial cartilages and was between 0.89 ± 0.06 and 0.92 ± 0.03 after QC&C. The baseline cartilage thickness was greater for U-Net-based than for manual segmentations in both the MFTC (3.23±0.50mm, 95%CI: [3.07, 3.39] vs. 2.98±0.79mm, 95%CI: [2.73, 3.22]) and LFTC (3.78±0.61mm, 95%CI: [3.59, 3.97] vs. 3.61±0.74mm, 95%CI: [3.38, 3.84]). After QC&C, the observed cartilage thickness (MFTC: 3.00±0.78mm, 95%CI: [2.76, 3.25], LFTC: 3.66±0.7mm, 95%CI: [3.44, 3.88]) was closer to that obtained from manual segmentations. The 2-year cartilage thickness change in the MFTC of knees without cartilage loss was consistent between all 3 methods (manual: 0.01±0.07mm, 95%CI: [-0.03, 0.05], without QC&C: 0.04±0.11mm, 95%CI: [-0.02, 0.10], with QC&C: 0.05±0.12mm, 95%CI: [-0.01, 0.12]). In knees with cartilage loss, the MFTC change tended to be lower before QC&C (-0.22±0.29mm, 95%CI: [-0.34, -0.11]) and with QC&C (-0.22±0.17mm, 95%CI: [-0.28, -0.15]) than for manual segmentations (-0.32±0.17mm, 95%CI: [-0.39, -0.25]). The effect size (Cohen's D) for differences in change between knees with vs. without cartilage loss was greatest for manual segmentations (2.28), followed by U-Net segmentations with QC&C (1.78) and without QC&C (1.13). The sensitivity to change (SRM=standardized response mean) in the MFTC of knees with thickness loss was greatest for manual segmentations (-1.83) followed by the U-Net with QC&C (-1.30) and without QC&C (-0.78). The lower change observed with U-Net-based segmentations may partially be attributed to a regression to the mean effect caused by specifically selecting the knees with the greatest cartilage thickness loss. Still, manual QC&C improved the agreement of automated and manual segmentations and resulted in a greater sensitivity to differences in cartilage thickness loss and also to a greater sensitivity to change when compared to fully-automated U-Net segmentations. This improvement required, however, a substantial amount of time. EU/EFPIA Innovative Medicines Initiative Joint Undertaking (grant n° 115770). AW, SM, FE, WW: Chondrometrics GmbH; FWR: Boston Imaging Core Lab; CHL: Merck KGaA (during the conduct of the study) and grants from IMI-APPROACH IMI-APPROACH (NCT03883568) participants and investigators CORRESPONDENCE ADDRESS: xiaosun@arizona.edu
Read full abstract