Abstract

This work aims to identify cancerous (malignant) and non-malignant (non-cancerous) cells in a breast cancer database. Wisconsin breast cancer data (WBC) was utilized and obtained from the University of California, Irvine's machine learning repository. The proposed approach involves the Naive bayes algorithm with Gaussian distribution of the function in combination with Chi-squared-based attributes selection approach. This experimentation has been done after reducing the dimensional space of the used data with extended Kernel Principal Component Analysis (K-PCA). Five different kernels in K-PCA have been tested after the implementation of necessary pre-processing techniques. The performance assessment of the proposed system has been evaluated based on confusion matrix-based accuracy, precision, sensitivity, and specificity. Our proposed methodology with six selected feature and sigmoid K-PCA attained the best accuracy of 99.28%. This result outer performs many state-of-the-art studies recently published on the identical dataset.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call