Gene selection is a pivotal process in machine-learning-driven medical diagnostics, where the goal is to identify a subset of genes from microarray expression profiles that can enhance the predictive accuracy of classifiers for disease diagnosis. The two key objectives of gene selection are to reduce the dimensionality of the data and to improve the accuracy of disease diagnosis, which is typically a multi-objective optimization problem. In recent years, multi-objective evolutionary algorithms (MOEAs) have gained wide attention in feature selection research, and several related algorithms have been produced. However, most algorithms tend to get stuck in local optimality when searching for solutions from a high-dimensional space. To solve the gene selection problem effectively, this study introduces a recursive multi-objective differential evolution algorithm with elite recursive strategy (RMODE-E) and a recursive multi-objective differential evolution algorithm with Pareto front recursive strategy (RMODE-P). RMODE-E amalgamates the features selected by the top E elite individuals, RMODE-P consolidates the features selected by the Pareto front set, and the combined features then serve as the foundation for subsequent recursive rounds of searching. The proposed feature subspace combination strategy not only reduces the recursive search space but also improves the capacity to find globally optimal feature subsets. Extensive experiments were conducted to compare our proposed algorithms with eight state-of-the-art evolutionary algorithms to validate their effectiveness. Experimental results demonstrate that RMODE-P has better global search capability as it achieves better best classification accuracy, mean classification accuracy, and minimal gene subset size.
Read full abstract