Abstract

High-dimensionality and class imbalance represent two main challenges in classification. Recently, there is a growing number of datasets exhibiting the characteristics of the combination of the class imbalance and high-dimensionality. Genetic programming (GP) has been successfully applied to solve high-dimensional classification tasks. However, most existing GP methods may also suffer from a performance bias if the class distribution is unbalanced. Using fitness functions for cost adjustment is one of the most important methods in GP to address the class imbalance issue. This paper develops new fitness functions in GP to address the class imbalance issue in classification with high-dimensional unbalanced data. Two fitness functions are proposed to increase the performance of the traditional accuracy measures, and one fitness function is proposed to approximate Area Under Curve (AUC) with the goal to save the training time. Experiments on six high-dimensional unbalanced datasets show the better performance of the proposed fitness functions, compared to existing fitness functions.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call