Abstract

Compared with Western countries, the onset of breast cancer tends to occur at a younger age in Taiwanese women, with the median age at diagnosis of breast cancer being approximately 45–49 years, whereas the median age at diagnosis in Western countries is approximately between 70 and 74 years. With early detection and the development of treatment techniques that greatly improve prognosis and survival rate, the risk of recurrence in breast cancer survivors has increased significantly. Meanwhile, doctors and patients have always been giving importance to the issue of how to observe cancer recurrence with caution. According to the statistics of the Health Promotion Administration of the Ministry of Health and Welfare (2021), the 5-year survival rate of the top ten cancers showed an upward trend from 2012 to 2018. In Taiwan, due to the promotion of cancer screening in recent years, early detection and diagnosis have become feasible. Coupled with the advancement of chemotherapy, more patients have enjoyed a longer survival; however, longer survival is accompanied with the possibility of recurrence. The objective of this study is to use machine learning technology to identify the risk factors and clinical features in order to develop a predictive model for breast cancer recurrence. Clinical datasets were collected between 2009 and 2018 from four hospitals; there were a total of 5,788 valid records including 749 recurrent cases. Based on the literature and discussions with clinicians, nine predictive variables were determined as the risk factors for recurrence. The chi-square test was employed to determine the presence of significant associations between variables.The study process includes the following procedure: 1. collecting data from cancer registration centers of four hospitals, 2. analyzing the important risk factors of breast cancer recurrence by referring to clinical experts and literature, 3. cleaning up and recoding the data, 4. ranking the datasets in the order of importance using the gain ratio and information gain classifier, 5. predicting and clinically analyzing the six machine learning classifiers after dividing the datasets into training dataset and test dataset (ratio: 7:3) and multiplying the random samples by ten. Of 5,788 medical records were accessible by four cancer Registries in Taiwan, between 2009 and 2018. There were 749 recurrence cases. Based on literature review and clinical expert consultation, there’re nine independent variables and one dependent variable in this study, (1) Grade/differentiation, (2) Tumor size, (3) Clinical stage Group, (4) Pathologic Stage Group, (5) Surgical Margins involvements of The Primary Site, (6) Surgery, (7) Radiotherapy, (8) Chemotherapy, (9)BMI. The adopted classification techniques included linear discriminant analysis, logistic regression, C4.5 decision tree, classification and regression tree, random forest, and C5.0 decision tree. This study uses the synthetic minority oversampling technique to adjust the imbalance of sample categories. Finally, all classification trees are merged, and the final classification result is obtained by majority voting algorithm. Logistic regression is more of a classification algorithm rather than a regression method. The known independent variable is usually used to predict the value of a discrete dependent variable, and the probability of an event is predicted by fitting a logic function (logit function). The output value should be between 0 and 1. In addition, this study considers accuracy, sensitivity, specificity, and receiver operating characteristic analysis to evaluate the classification accuracy by estimating the area under the curve (AUC).The study results demonstrated that the most important risk factor for breast cancer recurrence is pathological stage, followed by surgical and clinical stages. For patients aged <50 with breast cancer, CART is the best performance with accuracy (0.7907), sensitivity (0.7438), specificity (0.8059), and AUC (0.8493). For patients aged ≥50 with breast cancer, CART is the best performance with accuracy (0.8349), sensitivity (0.7925), specificity (0.8489), and AUC (0.874). Furthermore, for patients aged <50 with breast cancer, the rule of response to clinical evidence is to track the residual tumor at the surgical margin of resection and subsequent chemotherapy on residual tumor after surgery with the clinical stage record serving as a reference in the early stage (pathological stage <2B). For patients aged ≥50 with breast cancer, the results of this study are consistent with previous reports (Chang et al., 2019). The key to the overall assessment of recurrence is to consider the pathological stage (advanced stage ≥2B), postoperative follow-up (including radiotherapy and residual tumor at the surgical margin of resection), observation of BMI, and, in particular, subsequent chemotherapy as well as the classification and differentiation of tumor. The results of analysis using the predictive classification technology showed that the CART method produces the best classification and the most promising results to predict the recurrence in breast cancer survivors and provides indicators for the importance of predictive variables. By using the decision tree model, clinicians may be able identify factor combinations for the conditions of interest.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.