Abstract

The first step in diagnosis of a breast cancer is the identification of the disease. Early detection of the breast cancer is significant to reduce the mortality rate due to breast cancer. Machine learning algorithms can be used in identification of the breast cancer. The supervised machine learning algorithms such as Support Vector Machine (SVM) and the Decision Tree are widely used in classification problems, such as the identification of breast cancer. In this study, a machine learning model is proposed by employing learning algorithms namely, the support vector machine and decision tree. The kaggle data repository consisting of 569 observations of malignant and benign observations is used to develop the proposed model. Finally, the model is evaluated using accuracy, confusion matrix precision and recall as metrics for evaluation of performance on the test set. The analysis result showed that, the support vector machine (SVM) has better accuracy and less number of misclassification rate and better precision than the decision tree algorithm. The average accuracy of the support vector machine (SVM) is 91.92 % and that of the decision tree classification model is 87.12 %.

Highlights

  • The Support Vector Machine (SVM) and the Decision tree are a supervised machine learning models used in classification, regression and outlier detection [1,2]

  • Some of the problems that this paper is intended to address are the following: 1) What is the accuracy of the SVM and the decision tree algorithm on the breast cancer identification? 2) Can we develop a machine learning model for breast cancer identification with an acceptable level of accuracy?

  • Many supervised machine learning models such as SVM, Decision tree, Adaboost, Naive Bayes and neural network can be used in disease classification problems and diagnosis of different diseases, the accuracy and the complexity of training the models is different for each algorithm in classification as well as diagnosis of the diseases

Read more

Summary

RELATED WORKS

The Support Vector Machine (SVM) and the Decision tree models are supervised machine learning models used in classification problems such disease classification and diagnosis. The breast tumor is either malignant or benign and the kaggle dataset of breast cancer used in training and testing the models contain two classes This implies that, SVM model can be used in breast cancer classification and diagnosis problem. In [4] a hybrid approach is used in feature extraction and mass classification of breast cancer Another machine learning research on breast cancer diagnosis using the Support Vector Machine (SVM) and Artificial Neural Network was proposed in [5]. The authors compared Byes Navies, K-Nearest Neighbour (K-NN) and the Support Vector Machine (SVM) using the Wisconsin Breast Cancer dataset. The accuracy and confusion matrix, recall, receiver operating characteristic (ROC) and precision are used as evaluation metrics to evaluate the performance of the proposed model on the kaggle breast cancer data repository

Dataset Description
AND DISCUSSION
Accuracy of the models
RESEARCH METHOD
Precision Recall Analysis
Confusion matrix analysis
Precision analysis
Findings
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call