Abstract PS8-38: Development of random forest classifier method to predict breast cancer mortality among early stage breast cancer patients

Arwen Chandler

doi:10.1158/1538-7445.sabcs20-ps8-38

Abstract

Abstract Background: Among women diagnosed with early stage breast cancer, breast cancer mortality was conditional on the onset of distant recurrence. Gene expression profiling (GEP) tests (e.g., Oncotype Dx) have been commonly used to determine the risk level of distant recurrence, but its predictive ability is often questioned. The objective of this study aims to explore the potential of random forest machine learning to predict the risk of breast cancer mortality.Method: 63,104 patients with node-negative early stage breast cancer in the SEER-GHI dataset were randomly sampled into two groups, for the purpose of training and testing the model where each group consisted of 67% and 33% of all patients, respectively. The patient characteristics and tumor features in the study included age, tumor size, grade, chemotherapy use, Oncotype DX test recurrence score, and more. Further feature selection in the construction of the predictive model was conducted via the Boruta algorithm. As the dataset is highly imbalanced with only a very small percentage of samples being fatal, Synthetic Minority Over-Sampling Technique (SMOTE) was employed to address this issue. A cross-validated grid search finds the best combination of hyper-parameters to improve the model’s performance.Results: The mean and standard deviation of follow-up duration in the cohort of patients were 32.4 and 14.3 months, respectively. Of 37,043 patients, there were 158 breast cancer deaths. Within up to a 59-month follow-up period, the breast cancer specific mortality rate was 0.43%. Based on the limited available data, the outcome of the random forest classifier method in predicting breast cancer mortality shows sensitivity 0.712, specificity 0.773, overall accuracy 0.773, and AUC (area under ROC curve) 0.742. Conclusion: The use of the random forest ML algorithm to predict breast cancer mortality is promising when the hyper-parameters are fine-tuned. However, the model's positive predictive value is highly associated with sufficient data sources. Citation Format: Arwen Chandler. Development of random forest classifier method to predict breast cancer mortality among early stage breast cancer patients [abstract]. In: Proceedings of the 2020 San Antonio Breast Cancer Virtual Symposium; 2020 Dec 8-11; San Antonio, TX. Philadelphia (PA): AACR; Cancer Res 2021;81(4 Suppl):Abstract nr PS8-38.

Full Text