An Investigation on the Use of Clustering Algorithms for Data Preprocessing in Breast Cancer Diagnosis

Ali Şenol,Mahmut Kaya

doi:10.46810/tdfd.1364397

Abstract

Classification algorithms are commonly used as a decision support system for diagnosing many diseases, such as breast cancer. The accuracy of classification algorithms can be affected negatively if the data contains outliers and/or noisy data. For this reason, outlier detection methods are frequently used in this field. In this study, we propose and compare various models that use clustering algorithms to detect outliers in the data preprocessing stage of classification to investigate their effects on classification accuracy. Clustering algorithms such as DBSCAN, HDBSCAN, OPTICS, FuzzyCMeans, and MCMSTClustering (MCMST) were used separately in the data preprocessing stage of the k Nearest Neighbor (kNN) classification algorithm for outlier elimination, and then the results were compared. According to the obtained results, MCMST algorithm was more successful in outlier elimination. The classification accuracy of the kNN + MCMST model was 0.9834, which was the best one, while the accuracy of kNN algorithm without using any data preprocessing was 0.9719.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

An Investigation on the Use of Clustering Algorithms for Data Preprocessing in Breast Cancer Diagnosis

Abstract

Talk to us

Similar Papers

More From: Türk Doğa ve Fen Dergisi

Lead the way for us

Journal: Türk Doğa ve Fen Dergisi	Publication Date: Mar 26, 2024
License type: cc-by-nc

Similar Papers

A Large-Scale k -Nearest Neighbor Classification Algorithm Based on Neighbor Relationship Preservation
Yunsheng Song ... Xiaohan Kong
Wireless Communications and Mobile Computing | VOL. 2022
Yunsheng Song, et. al.Yunsheng Song ... Xiaohan Kong
07 Jan 2022
Wireless Communications and Mobile Computing | VOL. 2022

Adaptive Learning-Based -Nearest Neighbor Classifiers With Resilience to Class Imbalance.
Sankha Subhra Mullick ... Shounak Datta
IEEE Transactions on Neural Networks and Learning Systems | VOL. 29
Sankha Subhra Mullick, et. al.Sankha Subhra Mullick ... Shounak Datta
27 Mar 2018
IEEE Transactions on Neural Networks and Learning Systems | VOL. 29

Classifying and clustering in negative databases
Ran Liu ... Wenjian Luo
Frontiers of Computer Science | VOL. 7
Ran Liu, et. al.Ran Liu ... Wenjian Luo
25 Sep 2013
Frontiers of Computer Science | VOL. 7

Misfire and valve clearance faults detection in the combustion engines based on a multi-sensor vibration signal monitoring
Kamal Jafarian ... Elaheh Rabiei
Measurement | VOL. 128
Kamal Jafarian, et. al.Kamal Jafarian ... Elaheh Rabiei
20 Apr 2018
Measurement | VOL. 128

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

An Investigation on the Use of Clustering Algorithms for Data Preprocessing in Breast Cancer Diagnosis

Abstract

Talk to us

Similar Papers

More From: Türk Doğa ve Fen Dergisi