Abstract

For studying cancer and genetic diseases, the issue of identifying high correlation genes from high-dimensional data is an important problem. It is a great challenge to select relevant biomarkers from gene expression data that contains some important correlation structures, and some of the genes can be divided into different groups with a common biological function, chromosomal location or regulation. In this paper, we propose a penalized accelerated failure time model CHR-DE using a non-convex regularization (local search) with differential evolution (global search) in a wrapper-embedded memetic framework. The complex harmonic regularization (CHR) can approximate to the combination and ℓq (1 ≤ q < 2) for selecting biomarkers in group. And differential evolution (DE) is utilized to globally optimize the CHR’s hyperparameters, which make CHR-DE achieve strong capability of selecting groups of genes in high-dimensional biological data. We also developed an efficient path seeking algorithm to optimize this penalized model. The proposed method is evaluated on synthetic and three gene expression datasets: breast cancer, hepatocellular carcinoma and colorectal cancer. The experimental results demonstrate that CHR-DE is a more effective tool for feature selection and learning prediction.

Highlights

  • Feature selection is a great step forward for selecting biomarkers in biological data with high dimension and small sample

  • We developed an efficient path seeking algorithm to optimize this penalized model

  • We have proposed a penalized accelerated failure time model complex harmonic regularization (CHR)-Differential evolution (DE) to recognize the biomarkers that are both biologically meaningful and clinically

Read more

Summary

Introduction

Feature selection is a great step forward for selecting biomarkers in biological data with high dimension and small sample. We employÀ a complex harmonic regularization (CHR) [10] that approximates to the combination ‘p and lq (1 q < 2) to select the key factors in group among all features This approach avoided determining the value of p or q in advance, i.e., we would not need to assume the probability distribution of the data, before evaluating the grouping effect and spare by the existing regularization methods. Liu et al [13] proposed a hybrid genetic algorithm which combines genetic algorithm with embedded l1/2 + l2 regularization together Such evolutionary algorithms are suitable to deal with tuning hyperparameters of these multimodal penalty functions. We present a wrapper-embedded memetic framework that utilizes DE to globally optimize the hyperparameters of non-convex regularization CHR that is a local search to select biomarkers in group.

Accelerated failure time model
Path seeking algorithm for complex harmonic regularization penalty
3: Compute flj ðnÞgk 1
A wrapper-embedded memetic framework
Chromosome representation
Synthetic datasets
Real datasets
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call