Comparison of machine learning models for breast cancer diagnosis

Rania R Kadhim,Mohammed Y Kamil

doi:10.11591/ijai.v12.i1.pp415-421

Abstract

<p><span lang="EN-US">Breast cancer is the most common cause of death among women worldwide. Breast cancer can be detected early, and the death rate can be reduced. Machine learning techniques are a hot topic for study and have proved influential in cancer prediction and early diagnosis. This study's objective is to predict and diagnose breast cancer using machine learning models and evaluate the most effective based on six criteria: specificity, sensitivity, precision, accuracy, F1-score and receiver operating characteristic curve. All work is done in the anaconda environment, which uses Python's NumPy and SciPy numerical and scientific libraries, and pandas and matplotlib. This study used the Wisconsin diagnostic breast cancer dataset to test ten machine learning algorithms: decision tree, linear discriminant analysis, forests of randomized trees, gradient boosting, passive aggressive, logistic regression, naïve Bayes, nearest centroid, support vector machine, and perceptron. After collecting the findings, we performed a performance evaluation and compared these various classification techniques. Gradient boosting model outperformed all other algorithms, scoring 96.77% on the F1-score.</span></p>

Full Text