Prediction Breast Cancer as Benign or Malignant in Apache Spark Framework

Wafaa S Albaldawi,Rafah M Almuttairi

doi:10.1088/1757-899x/928/3/032046

Abstract

There are number of diseases that increase the number of deaths over the world. Breast cancer can be considered as the most common of them. Therefore, there is a need to use classification and others data mining methods to study the health datasets in order to diagnosis and make decisions. In this paper, Support Vector Classifier model, Logistic Regression algorithm, and Random Forest algorithm are conducted on the public available Wisconsin Breast Cancer dataset. The experiment is executed in a Scala environment. Moreover, in single and multi-nodes spark cluster. The results show the high accuracy in Support Vector Classifier model and the low error rate in less time consumed when compared with other studies. The authentication in spark are applied in the application by using shared secrete method.

Full Text