Abstract

Breast cancer is the most common cause of cancer among women worldwide. This paper analyses the performance of supervised and unsupervised models for breast cancer classification. Data from Wisconsin Breast Cancer Dataset is used in this paper. Feature selection is processed through scaling and principal component analysis. Final results indicate that Ensemble Voting approach is ideal as a predictive model for breast cancer. The raw data has 569 cases of breast cancer. The data is split into training and testing sets in the ration 70:30, respectively. The benchmark model is then created using Random Forest method. Various models are trained and tested on the data after Feature Scaling and Principle Component Analysis. Cross-validation is performed which showed that our model is stable. Among all the evaluated models, only four models, i.e., Ensemble - Voting Classifier, Logistics Regression, SVM Tuning and AdaBoost returned with accuracy of at least 98%. Based on results of the precision and recall, ROC-AVC, Fl-measure and computational time of the models, the Ensemble showed the most potential in breast cancer classification of the given dataset.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.