Abstract Introduction Tumor metastasis is a major clinical challenge accounting for the vast majority of cancer related deaths.In previous studies, prediction of distant metastasis was based on subtypes,clinical status and sometimes gene expression were used however clinical application was difficult. In this study, we develop the easy to use prediction tool for distant metastasis using clinical characteristics and gene profiles which came from CancerSCANTM, Next Generation Sequencing based targeted-sequencing platform designed at Samsung Medical Center(SMC). Methods We performed a retrospective chart review of 326 breast cancer patients who underwent surgery and CancerSCAN TM between Jan 2001 and Dec 2014 at SMC. Median follow up period was 83 months (Range 1˜190). Cancer scanTM cover 381 genes but 27 genes and 34 occasions (loss of function, mutation or copy number variation) were selected for analysis through gradient boosting and Wilcoxon Signed rank test. Azure Machine Learning is a cloud service that enables the execution of machine learning processes.This was accomplished using the steps of (1) edit the data, (2) split the data, (3) train the model, (4) score the model, and (5) evaluate the model. We split the modeling data into training and testing sets using a randomized 50–50 split. Two-class Decision Forest method was used. After deploying the Azure ML predictive model as a web service, we used a Representational State Transfer application programming interface to send data and obtained predictions in real-time. Results No distant metastasis group and distant metastasis group consisted of 267 and 59 patients, respectively. HR-/HER2+ and 50 years old and over patients were higher in metastasis group (p-value = 0.003 and p-value = 0.000). Nuclear grade 3 and N2,3 were higher in metastasis group (p-value = 0.010 and p-value = 0.000, p-value = 0.001 respectively). Stage III was also higher in metastasis group (p-value = 0.000). Among 59 patients with distantmetastasis, multiple sites metastasis was 21 cases (35.6%) and then lung metastasis was 19 cases (32.2%). In the 21 cases of multiple sites metastasis, triple sites was 6 cases (28.6%) and double sites was 15 cases (71.4%). PIK3CA mutation was the most frequent gene variation in all patients (34.5% of no metastasis group and 27.1% of metastasis group) but there was no difference between two groups(p-value = 0.278). BRCA 1 loss of function and BRCA2 loss of function were more frequent in metastasis group than no metastasis group(p-value = 0.033 and p-value = 0.024, respectively) but total counts was too small. We assessed the area under the curve (AUC) of the receiver operating characteristic (ROC) curve for predictive value. The AUC of ROC curve was 1.000 and also accuracy, precision, recall were 1.000. In addition, we conducted internal validation using 83 patients during 2015. When we applied a 0.5 threshold value with our predictive model, true negative was 81 and true positive was 2 among 83 patients. Finally, the accuracy of validation was 1.000. Conclusion Our predicted model could represent a useful and easy-to-access tool for the selection of patients with distant metastasis. After additional evaluation with large data and external validation, worldwide use of our model could be expected. Citation Format: Kim I, Choi HJ, Ryu JM, Lee SK, Yu JH, Kim SW, Nam SJ, Seo SW, Lee JE. A predictive model for distant metastasis in breast cancer patients using machine learning [abstract]. In: Proceedings of the 2018 San Antonio Breast Cancer Symposium; 2018 Dec 4-8; San Antonio, TX. Philadelphia (PA): AACR; Cancer Res 2019;79(4 Suppl):Abstract nr P2-08-52.
Read full abstract