Abstract

One of the most dreadful disease is breast cancer and it has a potential cause for death in women. Every year, death rate increases drastically due to breast cancer. An effective way to classify data is through classification or data mining. This becomes very handy, especially in the medical field where diagnosis and analysis are done through these techniques. Wisconsin Breast cancer dataset is used to perform a comparison between SVM, Logistic Regression, Naïve Bayes and Random Forest. Evaluating the correctness in classifying data based on accuracy and time consumption is used to determine the efficiency of the algorithms, which is the main objective. Based on the result of performed experiments, the Random Forest algorithm shows the highest accuracy (99.76%) with the least error rate. ANACONDA Data Science Platform is used to execute all the experiments in a simulated environment.

Highlights

  • Being the most frequently occurring cancer in women, breast cancer affects around 10% of women at some point in their life

  • The paper provides you with a analysis of performance and comparison of accuracy in classification between the algorithms such as: Logistic Regression, SVM, Random Forest and Naïve Bayes, being the major influential algorithms of data mining used in the research community

  • [3] Their paper is about using powerful machine learning classification algorithm Naïve Bayes, C4.5 which is usually used in data mining and ANN a neural network algorithm for the tumour classification of breast cancer in dataset

Read more

Summary

INTRODUCTION

Being the most frequently occurring cancer in women, breast cancer affects around 10% of women at some point in their life. The paper provides you with a analysis of performance and comparison of accuracy in classification between the algorithms such as: Logistic Regression, SVM, Random Forest and Naïve Bayes, being the major influential algorithms of data mining used in the research community. In that group of algorithms, they all classify the data independently such that no algorithm provides same classification result or analysis. In common words no two tree is similar in Random Forest such that every tree provides implies classification methodology. This can be useful in certain instances like these where we have to classify medical dataset. SVM creates a hyperplane which is basically called as threshold limit to classify data This limit is created by the dataset while training. Algorithm’s efficiency evaluation is the primary objective of this project

RELATED WORK
EXPERIMENT
EXPERIMENTAL RESULTS
RESULTS AND DISCUSSION
CONCLUSION
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.