Breast Cancer Likelihood Prediction Using Bayesian Networks

G Nanda,R Sundararajan

doi:10.9734/bpi/cimms/v4/16996d

Abstract

Breast cancer is the most common cancer among women, worldwide. Early detection of breast cancer using machine learning algorithms, with validation by doctors can lead to early diagnosis, treatment, prolonged life of the patient, reduction in the cost of treatment, and decrease in the number of deaths. To achieve this goal, we examined Bayesian Network machine learning models in this study to analyze the big medical data and tested various approaches to increase the accuracy for predicting the likelihood and early detection of the breast cancer. The machine learning library, WEKA (Waikato Environment for Knowledge Analysis) and the Breast Cancer Surveillance Consortium (BCSC) dataset with 53,370 screening records were used. This dataset included thirteen variables and the variable “breast_cancer_history” was considered as the main class variable to be predicted with “class 0” meaning negative breast cancer diagnosis and “class 1” meaning positive breast cancer diagnosis. We used three different Bayesian Networks generated by different structure-determination algorithms- K2 (Hill Climbing Algorithm), TAN (Tree Augmented Naive Bayes), and Simulated Annealing. We compared their performances and associated network structure of the Bayes Networks using performance measures Recall and Precision. Our results indicate that the Bayesian Network generated by Simulated Annealing (SA) resulted in overall best prediction performance, followed by TAN and K2. For Class 1, the Recall was highest for K2 followed by SA and TAN, and Precision was highest for SA followed by TAN and K2. Comparison of network structures of the three Bayesian Networks indicated that the SA had the most complex network structures with maximum connections between the variable nodes, followed by TAN, and K2.

Full Text