BackgroundsTo develop a machine learning (ML) model for predicting the prognosis of breast cancer (BC) patients with low human epidermal growth factor receptor 2 (HER2) expression, and to investigate the association between clinicopathological characteristics and outcomes in HER2-low BC (HLBC) patients.MethodsA retrospective analysis was conducted on data from 998 female HLBC patients treated at the Breast Center of the Fourth Hospital of Hebei Medical University (Hebei, China) between January 1, 2017, and December 31, 2020. To address class imbalance, the synthetic minority over-sampling technique was applied. Feature selection was performed using the least absolute shrinkage and selection operator, followed by construction of the prediction model using the random forest algorithm. Model performance, including specificity and accuracy, was assessed using the receiver operating characteristic (ROC) curve and confusion matrix, comparing it against other ML models. Additionally, the log-rank test was employed to examine the relationship between selected features and patient outcomes in HLBC.ResultsThe random survival forest model demonstrated superior accuracy and specificity in predicting survival outcomes for HLBC patients. Compared with other ML models, it achieved more precise predictions of Disease-Free Survival (DFS) at 1, 2, and 3 years, with the area under the ROC curve (AUC) in the test and training cohorts measured at 0.726 and 0.819, 0.712 and 0.776, and 0.685 and 0.774, respectively. The analysis further identified a strong correlation between poor prognosis in HLBC patients and factors such as axillary lymph node dissection, family history, elevated topoisomerase (TOPO)-2 expression, advanced clinical stage, negative progesterone receptor status, P53 mutation, and increased Ki67 expression, observed across both cohorts.ConclusionsA novel ML model was developed for accurate prognosis prediction in HLBC patients, offering valuable insights into prognostic risk factors. This model equips clinicians with enhanced data to guide treatment decisions, ultimately contributing to improved patient outcomes.
Read full abstract