Abstract

Objective: To suggest an automated diagnostic system for the early detection of breast cancer. Methods: This problem has been addressed by making use of machine learning algorithms that can accurately classify a tumor as either malignant or benign by identifying the minimum number of image features. A comparative study on various classification approaches such as Decision Tree, Support Vector Machine, K-Nearest Neighbor and Random Forest have also been conducted with a focus on cross validation to identify the best performing model. Findings: The study shows that Random Forest classifier gives the maximum accuracy. It also highlights that cross validation and fine tuning are necessary to prevent over fitting of data. Improvements: It has been observed that the selection of parameters play a very important role in correct classification as multicollinearity among attributes can render classifier models ineffective. Keywords: Breast Cancer, Classification, Cross Validation, Decision Tree, K-Nearest Neighbor, Logistic Regression, Random Forest, Support Vector Machine

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call