Abstract In this paper we propose a novel Shapely Value Embedded Genetic Algorithm, called as SVEGA that improves the breast cancer diagnosis accuracy that selects the gene subset from the high dimensional gene data. Particularly, the embedded Shapely Value includes two memetic operators namely “include” and “remove” features (or genes) to realize the genetic algorithm (GA) solution. The method is ranking the genes according to its capability to differentiate the classes. The method selects the genes that can maximize the capability to discriminate between different classes. Thus, the dimensionality of data features is reduced and the classification accuracy rate is improved. Four classifiers such as Support vector machine (SVM), Naive Bayes (NB), K-Nearest Neighbor (KNN) and J48 are used on the breast cancer dataset from the Kent ridge biomedical repository to classify between the normal and abnormal tissues and to diagnose as benign and malignant tumours. The obtained classification accuracy demonstrates that the proposed method contributes to the superior diagnosis of breast cancer than the existing methods.
Read full abstract