BACKGROUND: MRI radiomics has been explored for three-tiered classification of breast cancer HER2 expression (i.e., HER2-zero, HER2-low, or HER2-positive), although understanding of how such models reach their predictions is lacking. OBJECTIVE: To develop and test multiparametric MRI radiomics machine-learning models for differentiating three-tiered HER2 expression levels in patients with breast cancer, and to explain the contributions of model features through local and global interpretations using SHapley Additive exPlanation (SHAP) analysis. METHODS: This retrospective study included 737 patients (mean age, 54.1±10.6 years) with breast cancer from two centers (center 1: n=578; center 2: n=159), who underwent breast MRI and had HER2 expression determined after excisional biopsy. Analysis entailed two tasks: differentiating HER2-negative (i.e., HER2-zero or HER2-low) from HER2-positive tumors (task 1), and differentiating HER2-zero from HER2-low tumors (task 2). For each task, patients from center 1 were randomly assigned in 7:3 ratio to training (task 1: n=405; task 2: n=284) or internal test (task 1: n=173; task 2: n=122) sets; those from center 2 formed an external test set (task 1: n=159; task 2: n=105). Radiomics features were extracted from early-phase dynamic contrast-enhanced images (DCE), T2-weighted images (T2WI), and DWI. For each task, a support vector machine (SVM) was used for feature selection; a multiparametric radiomics score (radscore) was computed using feature weights from SVM correlation coefficients; conventional MRI and combined models were constructed; and model performances were evaluated. SHAP analysis was used to provide local and global interpretations for model outputs. RESULTS: In the external test set, for task 1, AUCs for the conventional MRI model, radscore, and combined model were 0.624, 0.757, and 0.762, respectively; for task 2, AUC for radscore was 0.754, and no conventional MRI model or combined model could be constructed. SHAP analysis identified early-phase DCE features as having the strongest influence for both tasks; T2WI features also had a prominent role for task 2. CONCLUSION: The findings indicate suboptimal performance of MRI radiomics models for noninvasive characterization of HER2 expression. CLINICAL IMPACT: The study provides an example of the use of SHAP interpretation analysis to better understand predictions of imaging-based machine learning models.
Read full abstract