Abstract
ObjectiveTo investigate whether radiomics features extracted from multi-parametric MRI combining machine learning approach can predict molecular subtype and androgen receptor (AR) expression of breast cancer in a non-invasive way.Materials and MethodsPatients diagnosed with clinical T2–4 stage breast cancer from March 2016 to July 2020 were retrospectively enrolled. The molecular subtypes and AR expression in pre-treatment biopsy specimens were assessed. A total of 4,198 radiomics features were extracted from the pre-biopsy multi-parametric MRI (including dynamic contrast-enhancement T1-weighted images, fat-suppressed T2-weighted images, and apparent diffusion coefficient map) of each patient. We applied several feature selection strategies including the least absolute shrinkage and selection operator (LASSO), and recursive feature elimination (RFE), the maximum relevance minimum redundancy (mRMR), Boruta and Pearson correlation analysis, to select the most optimal features. We then built 120 diagnostic models using distinct classification algorithms and feature sets divided by MRI sequences and selection strategies to predict molecular subtype and AR expression of breast cancer in the testing dataset of leave-one-out cross-validation (LOOCV). The performances of binary classification models were assessed via the area under the receiver operating characteristic curve (AUC), accuracy, sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV). And the performances of multiclass classification models were assessed via AUC, overall accuracy, precision, recall rate, and F1-score.ResultsA total of 162 patients (mean age, 46.91 ± 10.08 years) were enrolled in this study; 30 were low-AR expression and 132 were high-AR expression. HR+/HER2− cancers were diagnosed in 56 cases (34.6%), HER2+ cancers in 81 cases (50.0%), and TNBC in 25 patients (15.4%). There was no significant difference in clinicopathologic characteristics between low-AR and high-AR groups (P > 0.05), except the menopausal status, ER, PR, HER2, and Ki-67 index (P = 0.043, <0.001, <0.001, 0.015, and 0.006, respectively). No significant difference in clinicopathologic characteristics was observed among three molecular subtypes except the AR status and Ki-67 (P = <0.001 and 0.012, respectively). The Multilayer Perceptron (MLP) showed the best performance in discriminating AR expression, with an AUC of 0.907 and an accuracy of 85.8% in the testing dataset. The highest performances were obtained for discriminating TNBC vs. non-TNBC (AUC: 0.965, accuracy: 92.6%), HER2+ vs. HER2− (AUC: 0.840, accuracy: 79.0%), and HR+/HER2− vs. others (AUC: 0.860, accuracy: 82.1%) using MLP as well. The micro-AUC of MLP multiclass classification model was 0.896, and the overall accuracy was 0.735.ConclusionsMulti-parametric MRI-based radiomics combining with machine learning approaches provide a promising method to predict the molecular subtype and AR expression of breast cancer non-invasively.
Highlights
According to the International Agency for Research on Cancer, breast cancer has become the most prevalent cancer and the leading cause of cancer death in women worldwide [1]
The highest performances were obtained for discriminating triple-negative breast cancer (TNBC) vs. non-TNBC (AUC: 0.965, accuracy: 92.6%), human epidermal growth factor receptor-2 (HER2)+ vs. HER2− (AUC: 0.840, accuracy: 79.0%), and hormone receptor (HR)+/HER2− vs. others (AUC: 0.860, accuracy: 82.1%) using Multilayer Perceptron (MLP) as well
Multi-parametric magnetic resonance imaging (MRI)-based radiomics combining with machine learning approaches provide a promising method to predict the molecular subtype and androgen receptor (AR) expression of breast cancer non-invasively
Summary
According to the International Agency for Research on Cancer, breast cancer has become the most prevalent cancer and the leading cause of cancer death in women worldwide [1]. It is crucial to detect the ER, PR, and Abbreviations: ER, estrogen receptor; PR, progesterone receptor; HER2, human epidermal growth factor receptor-2; TNBC, triple-negative breast cancer; HR, hormonal receptor; MRI, magnetic resonance imaging; T1-DCE, T1-weighted dynamic contrast-enhanced images; DWI, diffusion-weighted images; FS-T2WI, fat-suppressed T2-weighting images; ADC, apparent dispersion coefficient; IHC, immunohistochemistry; FISH, fluorescence in situ hybridization; ROI, regions of interest; LOOCV, leave-one-out cross-validation; LASSO, the least absolute shrinkage and selection operator; RFE, recursive feature elimination; ROC, the receiver operation characteristic curves; AUC, the area under the receiver operating characteristic curve; pCR, pathological complete response; PCCM, Pearson correlation coefficient matrix; SD, standard deviation; 95% CI, 95% confidence interval; OS, overall survival; DFS, disease-free survival; LR, Logistic Regression; RF, Random Forest; SVM, Support Vector Machine; LDA, Linear Discriminant Analysis; GPC, Gaussian process classifier; GNB, Gaussian Naïve Bayes; MLP, Multilayer Perceptron; GLCM, gray level co-occurrence matrix; GLSZM, gray level size zone matrix; GLRLM, gray level run length matrix; NGTDM, neighborhood gray-tone difference matrix; NGLDM, neighboring gray level dependence matrix
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.