Abstract

Simple SummaryDue to lacking exploitation capability, traditional genetic algorithm cannot accurately identify the minimal best gene subset. Thus, the improved splicing method is introduced into a genetic algorithm to enhance exploitation capability for achieving balance between exploitation and exploration of GA. It can effectively identify true gene subsets with high probability. Furthermore, a dataset of the body weight of Hu sheep has been used to show that the proposed method can obtain a better minimal subset of genes with a few iterations, compared with all considered algorithms including genetic algorithm and adaptive best-subset selection algorithm.Selecting the minimal best subset out of a huge number of factors for influencing the response is a fundamental and very challenging NP-hard problem because the presence of many redundant genes results in over-fitting easily while missing an important gene can more detrimental impact on predictions, and computation is prohibitive for exhaust search. We propose a modified memetic algorithm (MA) based on an improved splicing method to overcome the problems in the traditional genetic algorithm exploitation capability and dimension reduction in the predictor variables. The new algorithm accelerates the search in identifying the minimal best subset of genes by incorporating it into the new local search operator and hence improving the splicing method. The improvement is also due to another two novel aspects: (a) updating subsets of genes iteratively until the no more reduction in the loss function by splicing and increasing the probability of selecting the true subsets of genes; and (b) introducing add and del operators based on backward sacrifice into the splicing method to limit the size of gene subsets. Additionally, according to the experimental results, our proposed optimizer can obtain a better minimal subset of genes with a few iterations, compared with all considered algorithms. Moreover, the mutation operator is replaced by it to enhance exploitation capability and initial individuals are improved by it to enhance efficiency of search. A dataset of the body weight of Hu sheep was used to evaluate the superiority of the modified MA against the genetic algorithm. According to our experimental results, our proposed optimizer can obtain a better minimal subset of genes with a few iterations, compared with all considered algorithms including the most advanced adaptive best-subset selection algorithm.

Highlights

  • Licensee MDPI, Basel, Switzerland.In data mining, feature selection is a fundamental strategy to handle “the curse of dimensionality” [1]

  • The main advantages of the proposed method are two-fold: (1) it can provide the promising subset of genes of the whole gene subset space based on selection and crossover operators; and (2) compared with traditional Genetic algorithm (GA), it has strong exploitation capability and recovers the minimal best subset of genes with high probability based on an improved splicing method

  • To test the performance of feature selection methods, for predicting the body weights on each occasion, the instances are divided into training set (170 samples) and test set

Read more

Summary

Introduction

Licensee MDPI, Basel, Switzerland.In data mining, feature selection is a fundamental strategy to handle “the curse of dimensionality” [1]. The filter algorithm is based on data characteristics, such as distance [6], correlation [7], and statistical distribution [8], to select subsets of genes [9]. Gene selection using filter algorithms is fast and simple, the top k genes contain some redundant and irrelevant genes for not considering correlation between genes and unreliable feature evaluation principle. The hybrid feature selection method is usually utilized to select a few important genes out of a huge number of genes [10]. The filter algorithm is firstly utilized to eliminate many genes, the wrapper algorithm is used to further compact the selected subset of genes [11]. One of the most typical methods is the hybrid dragonfly black hole algorithm for gene selection for the RNA-seq

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call