Abstract

DNA microarray data analysis is infamous due to a massive number of features, imbalanced class distribution, and limited available samples. In this paper, we focus on high-dimensional multi-class imbalanced problems. The high dimensional and multi-class imbalanced problem has posed acute challenges for the conventional classifiers to effectively perform classification tasks on both the minority and majority classes. Numerous efforts have been devoted to addressing either high dimensionality dataset or class imbalance problems. Nonetheless, few methods have been proposed to address the intersection of multi-class imbalanced and high-dimensional problems concurrently due to their intricate interactions. This paper presents novel hybrid algorithms for feature selection with the high dimensional multi-class imbalanced problem using multiple filter-based rankers (MFR) and hybrid Grasshopper optimization algorithm (GOA). The Simulated Annealing (SA) algorithm is incorporated into GOA. SA is used to enhance the best solution found by the GOA algorithm. The aim of using the SA here is to tackle the slow convergence and improve the exploitation by searching the high-quality regions found by the GOA. The experimental results confirm the effectiveness of the proposed methods in improving the classification performance in terms of area under the curve (AUC) compared to other well-known methods, which guarantees the ability of the proposed methods in searching the feature space and identifying very robust and discriminative features that best predict the minority class.

Highlights

  • Over the last decades, rapid technological developments have enabled researchers to analyse a massive amount of data from various application domains such as biomedical, information retrieval, and text classification [1]

  • Multiple filter-based rankers coupled with hybrid metaheuristic techniques using the Grasshopper optimization algorithm (GOA) algorithm were proposed (MFR-GOA, multiple filter-based rankers (MFR)-GOASA, and MFR-GOASAT)

  • In the High-Level Transmit Hybrid (HTH), Simulated Annealing (SA) was used to search the neighbourhood of the best-found solution after each iteration of the GOA algorith; two models were proposed using these algorithms, namely MFR-GOASA and MFR-GOASAT

Read more

Summary

Introduction

Rapid technological developments have enabled researchers to analyse a massive amount of data from various application domains such as biomedical, information retrieval, and text classification [1]. The characteristics of these datasets are a massive number of features with limited available samples and imbalanced class distribution; these open challenges have degraded the classification. Most real-world datasets are affected by the class imbalance problem due to the number of majority class examples (negative class) outnumbered the number of minority class examples (positive class).

Objectives
Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call