With the widespread application of high-throughput sequencing technology in recent years, the scale of high-dimensional gene sequence datasets has rapidly expanded. However, due to the high-dimensional nature of gene sequence data, researchers face the challenge of processing such complex data. One common preprocessing technique that can improve performance is feature selection, which selects the most relevant features from the original dataset, reducing its dimensionality. However, feature selection is often an NP-hard problem, and medical feature selection is mainly affected by the multiple attributes of features, which affect classification accuracy. In order to improve the generality of the algorithm, we propose RLHHO, which solves the feature selection problem of various medical gene datasets by combining Q-learning-guided mutation strategies. We also created binary RLHHO (bRLHHO) through conversion functions and evaluated its performance on 12 high-dimensional datasets. The experimental results show that bRLHHO is superior to the original HHO in improving classification accuracy and reducing the number of selected features. When processing high-dimensional medical gene datasets with over 1000 dimensions, bRLHHO can achieve good accuracy with fewer features. In summary, compared with other algorithms, including the original Harris Hawks Optimization (HHO), our proposed improved version of HHO, RLHHO, can handle various feature selection datasets and exhibits superior performance.
Read full abstract