The development of data sensing technology has generated a vast amount of high-dimensional data, posing great challenges for machine learning models. Over the past decades, despite demonstrating its effectiveness in data classification, genetic programming (GP) has still encountered three major challenges when dealing with high-dimensional data: 1) solution diversity; 2) multiclass imbalance; and 3) large feature space. In this article, we have developed a problem-specific multiobjective GP framework (PS-MOGP) for handling classification tasks with high-dimensional data. To reduce the large solution space caused by high dimensionality, we incorporate the recursive feature elimination strategy based on mining the archive of evolved GP solutions. A progressive domination Pareto archive evolution strategy (PD-PAES), which optimizes the objectives in a specific order according to their objectives, is proposed to evaluate the GP individuals and maintain a better diversity of solutions. Besides, to address the seriously imbalanced class issue caused by traditional binary decomposition (BD) one versus rest (OVR) for multiclass classification problems, we design a method named BD with a similar positive and negative class size (BD-SPNCS) to generate a set of auxiliary classifiers. Experimental results on benchmark and real-world datasets demonstrate that our proposed PS-MOGP outperforms state-of-the-art traditional and evolutionary classification methods in the context of high-dimensional data classification.
Read full abstract