Abstract
The collection of large volumes of medical data has offered an opportunity to develop prediction models for survival by the medical research community. Medical researchers who seek to discover and extract hidden patterns and relationships among large number of variables use knowledge discovery in databases (KDD) to predict the outcome of a disease. The study was conducted to develop predictive models and discover relationships between certain predictor variables and survival in the context of breast cancer. This study is Cross sectional. After data preparation, data of 22,763 female patients, mean age 59.4 years, stored in the Surveillance Epidemiology and End Results (SEER) breast cancer dataset were analyzed anonymously. IBM SPSS Statistics 16, Access 2003 and Excel 2003 were used in the data preparation and IBM SPSS Modeler 14.2 was used in the model design. Support Vector Machine (SVM) model outperformed other models in the prediction of breast cancer survival. Analysis showed SVM model detected ten important predictor variables contributing mostly to prediction of breast cancer survival. Among important variables, behavior of tumor as the most important variable and stage of malignancy as the least important variable were identified. In current study, applying of the knowledge discovery method in the breast cancer dataset predicted the survival condition of breast cancer patients with high confidence and identified the most important variables participating in breast cancer survival.
Highlights
Breast cancer is the most common malignancy among women that causes large number of neoplastic deaths across worldwide
In current study, applying of the knowledge discovery method in the breast cancer dataset predicted the survival condition of breast cancer patients with high confidence and identified the most important variables participating in breast cancer survival
knowledge discovery in databases (KDD) as a process consists of an iterative sequence of the following steps: understanding the domain of research field, understanding the data used in domain, handle missing values and remove irrelevant or redundant variables, applying methods in order to extract data patterns, and knowledge presentation (Delen et al, 2005; Han, Kamber, & Pei, 2011)
Summary
Breast cancer is the most common malignancy among women that causes large number of neoplastic deaths across worldwide. Once a patient is diagnosed with breast cancer, the malignant lump must be excised During this procedure, physicians must determine the prognosis of the disease. Survival analysis is a field in medical prognosis that deals with application of various methods to data stored in health datasets in order to predict the survival of a particular patient suffering from a disease over a particular time period (Delen, Walker, & Kadam, 2005). Health researchers who seek to discover and extract hidden patterns and relationships among large number of variables use knowledge discovery in databases (KDD) to predict the outcome of a disease (Bellazzi et al, 2011; Cios & William Moore, 2002). KDD as a process consists of an iterative sequence of the following steps: understanding the domain of research field (i.e., health domain), understanding the data used in domain, handle missing values and remove irrelevant or redundant variables (data preparation), applying methods in order to extract data patterns (data mining), and knowledge presentation (Delen et al, 2005; Han, Kamber, & Pei, 2011)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.