Cancer is one of the leading causes of death across the globe. There is a need for early diagnosis to improve the chance of successful treatment and reduce the mortality associated with cancer. Due to the availability of highly specialized cancer datasets, molecular classification of cancer by gene expression, machine learning, and deep learning, a part of artificial intelligence (AI) techniques is used in detecting the disease. The application of several classification and feature selection methods on microarray gene expression datasets helps learn models that are able to predict a given disease. However, the tremendous dimensionality of the microarray cancer dataset is the greatest challenge in interpreting the data. In this work, the optimal feature subsets are selected by combining the correlation‐based feature selection (CFS) technique with five distinct meta‐heuristic search methods: evolutionary search (ES), particle swarm optimization search (PSOS), genetic search (GS), harmony search (HS), and multiobject evolutionary search (MOES). Furthermore, a CFS‐MOES (correlation‐based feature selection—multiobject evolutionary search) ensemble model is proposed based on a majority voting mechanism to improve the classification performance. Six microarray cancer datasets are considered, and seven traditional classifiers are evaluated on those datasets. Three classifiers, namely, K‐nearest neighbour (KNN), multilayer perceptron (MLP), and random forest (RF), were chosen as the base classifiers based on their F‐measure score. The features chosen by our proposed CFS‐MOES method significantly improve the accuracy of the proposed model. Moreover, the proposed model has also been compared with the other ensemble models generated using CFS‐ES (correlation‐based feature selection —evolutionary search), CFS‐PSOS (correlation‐based feature selection—particle swarm optimization search), CFS‐GS (correlation‐based feature selection—genetic search), and CFS‐HS (correlation‐based feature selection—harmony search) feature selection methods, ensuring better classification accuracy with a reduced feature subset. This model is also evaluated using significant parameters such as precision, recall, F‐measure, accuracy, Matthews correlation coefficient (MCC), and mean absolute error (MAE). According to the experimental results, our proposed model has a remarkable accuracy of 98.83% for breast cancer and 98.79% for cervical cancer.
Read full abstract