Abstract

Variable selection has been highly successful in big data analyses, and regularization approaches are commonly used methods, which can automatically select important variables while constructing machine learning models. Due to the real datasets have complex relationships between relevant variable groups, many group-sparsity regularization approaches are proposed recently. However, these approaches usually are non-convex and sensitive to hyper-parameters. Therefore, optimization of the regularizations is a challenging task. In this paper, we present a novel memetic algorithm with the complex harmonic regularization (MA-CHR), which combines EC algorithm for hyper-parameters (global search) and complex harmonic regularization with path seeking strategy for the group-sparsity variable selection (local search). We further introduced a novel genetic individual representation (intron+exon) to efficient obtain the global optimal solution of this group-sparsity regularization. Simulation and five real data experiments demonstrate that the proposed MA-CHR method performs better than the state-of-the-art regularization methods in selecting groups of relevant variables and classification.

Highlights

  • Variable selection is one of the important issues in highdimensional and massive data analysis, which has attracted more attention in machine learning

  • MEMETIC ALGORITHM WITH COMPLEX HARMONIC REGULARIZATION To alleviate the limitations of the existing hybrid evolutionary computation methods for variable selection, and help to globally optimize hyper-parameters of the group-sparsity regularization methods, we present a novel memetic algorithm with complex harmonic regularization (MA-CHR) for variable selection

  • This paper first uses the evolutionary computation method to globally optimize the hyper-parameters in the non-convex machine learning model, and robust selection of related variable sets can be achieved by using complex harmonic regularization methods

Read more

Summary

Introduction

Variable selection is one of the important issues in highdimensional and massive data analysis, which has attracted more attention in machine learning. Regularizations are commonly used methods for variable selection, which can automatically select important variables while constructing machine learning model, and their applications have become increasingly popular. Lasso [1] is one of the most popular regularization methods. The datasets typically have high correlations or grouping effects between variables, especially in biological data, gene regulation networks and protein-protein interactions. Regularization methods can rapidly generalize to new tasks containing some correlation information with supervised learning. Some regularization methods can generate group-sparsity learning models without prior knowledge.

Objectives
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call