Preoperative intramuscular fat (IMF) is a strong predictor of tendon failure after a rotator cuff repair. Due to the contemporary labor intensive and time-dependent manual segmentation required for quantitative assessment of IMF, clinical implementation remains a challenge. The emergence of accurate three-dimensional evaluation of the rotator cuff may permit implementation with greater inter-rater reliability than common subjective scales (e.g., Goutallier classification (GC)). Here, we developed and validated convolutional neural network (CNN) model for auto-segmentation of the shoulder on Dixon MRI. Also, we aimed to assess the agreement between GC, two-dimensional (2D) and 3D IMF including their discriminatory ability for the identification of muscles above an IMF threshold shown to negatively impact surgical outcomes (i.e., GC ≥ 3). This study retrospectively obtained fat-water Dixon shoulder MRIs between March 2023 and March 2024 to develop and validate a CNN model for the segmentation of individual rotator cuff muscles and surrounding tissues. The CNN model was trained using a modified U-Net architecture (n = 80) and tested on an external dataset (n = 25). Accuracy was primarily evaluated using the Dice Similarity Coefficient (DSC) compared to manual segmentation. Reliability was evaluated by the intraclass correlation coefficient (ICC2,1) and discriminatory ability was evaluated by the area under the receiver operating characteristic curve (AUC). The model after training (37 male and 43 female, mean age = 55.8 ± 15.6 years) and testing (15 male and 10 female, mean age = 56.6 ± 19.7 years) produced DSCs of ≥ 0.89 except for teres minor (DSC = 0.86 ± 0.03). The model demonstrated excellent reliability for volume (ICC2,1 ≥ 0.93) and good to excellent reliability for IMF (ICC2,1 ≥ 0.80), with the exceptions of teres major volume (ICC2,1 = 0.82, 95% CI: 0.63 - 0.92, p < 0.001) and subscapularis IMF (ICC2,1 = 0.55, 95% CI: 0.22 - 0.77, p < 0.001). 3D IMF but not 2D IMF was associated with GC for the supraspinatus, subscapularis and infraspinatus (U ≥ 4.02, p < 0.045). The proposed CNN model's IMF outputs produced excellent discriminatory capability of muscles above the IMF threshold shown to negatively impact outcomes (AUC ≥ 0.93). The development of a CNN model allows for efficient, accurate segmentation of muscle and bone, enabling reliable evaluation of muscle quality. The model demonstrates that 2D evaluation of IMF is insufficient for differentiating between rotator cuff muscles either side of a clinically meaningful IMF threshold on the GC scheme, whereas 3D IMF shows excellent discriminant validity across all rotator cuff muscles.
Read full abstract