Background: The current standard for evaluating axillary nodal burden in clinically node negative breast cancer is sentinel lymph node biopsy (SLNB). However, the accuracy of SLNB to detect nodal stage N2-3 remains debatable. Nomograms can help the decision-making process between axillary treatment options. The aim of this study was to create a new model to predict the nodal stage N2-3 after a positive SLNB using machine learning methods that are rarely seen in nomogram development.Material and methods: Primary breast cancer patients who underwent SLNB and axillary lymph node dissection (ALND) between 2012 and 2017 formed cohorts for nomogram development (training cohort, N = 460) and for nomogram validation (validation cohort, N = 70). A machine learning method known as the gradient boosted trees model (XGBoost) was used to determine the variables associated with nodal stage N2-3 and to create a predictive model. Multivariate logistic regression analysis was used for comparison.Results: The best combination of variables associated with nodal stage N2-3 in XGBoost modeling included tumor size, histological type, multifocality, lymphovascular invasion, percentage of ER positive cells, number of positive sentinel lymph nodes (SLN) and number of positive SLNs multiplied by tumor size. Indicating discrimination, AUC values for the training cohort and the validation cohort were 0.80 (95%CI 0.71–0.89) and 0.80 (95%CI 0.65–0.92) in the XGBoost model and 0.85 (95%CI 0.77–0.93) and 0.75 (95%CI 0.58–0.89) in the logistic regression model, respectively.Conclusions: This machine learning model was able to maintain its discrimination in the validation cohort better than the logistic regression model. This indicates advantages in employing modern artificial intelligence techniques into nomogram development. The nomogram could be used to help identify nodal stage N2-3 in early breast cancer and to select appropriate treatments for patients.
Read full abstract