Abstract
For studying cancer and genetic diseases, the issue of identifying high correlation genes from high-dimensional data is an important problem. It is a great challenge to select relevant biomarkers from gene expression data that contains some important correlation structures, and some of the genes can be divided into different groups with a common biological function, chromosomal location or regulation. In this paper, we propose a penalized accelerated failure time model CHR-DE using a non-convex regularization (local search) with differential evolution (global search) in a wrapper-embedded memetic framework. The complex harmonic regularization (CHR) can approximate to the combination and ℓq (1 ≤ q < 2) for selecting biomarkers in group. And differential evolution (DE) is utilized to globally optimize the CHR’s hyperparameters, which make CHR-DE achieve strong capability of selecting groups of genes in high-dimensional biological data. We also developed an efficient path seeking algorithm to optimize this penalized model. The proposed method is evaluated on synthetic and three gene expression datasets: breast cancer, hepatocellular carcinoma and colorectal cancer. The experimental results demonstrate that CHR-DE is a more effective tool for feature selection and learning prediction.
Highlights
Feature selection is a great step forward for selecting biomarkers in biological data with high dimension and small sample
We developed an efficient path seeking algorithm to optimize this penalized model
We have proposed a penalized accelerated failure time model complex harmonic regularization (CHR)-Differential evolution (DE) to recognize the biomarkers that are both biologically meaningful and clinically
Summary
Feature selection is a great step forward for selecting biomarkers in biological data with high dimension and small sample. We employÀ a complex harmonic regularization (CHR) [10] that approximates to the combination ‘p and lq (1 q < 2) to select the key factors in group among all features This approach avoided determining the value of p or q in advance, i.e., we would not need to assume the probability distribution of the data, before evaluating the grouping effect and spare by the existing regularization methods. Liu et al [13] proposed a hybrid genetic algorithm which combines genetic algorithm with embedded l1/2 + l2 regularization together Such evolutionary algorithms are suitable to deal with tuning hyperparameters of these multimodal penalty functions. We present a wrapper-embedded memetic framework that utilizes DE to globally optimize the hyperparameters of non-convex regularization CHR that is a local search to select biomarkers in group.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have