An ensemble framework for microarray data classification based on feature subspace partitioning

Vahid Nosrati,Mohsen Rahmani

doi:10.1016/j.compbiomed.2022.105820

Abstract

Feature selection is exposed to the curse of dimensionality risk, and it is even more exacerbated with high-dimensional data such as microarrays. Moreover, the low-instance/high-feature (LIHF) property of microarray data needs considerable processing time to do some calculations and comparisons among features to choose the best subset of them, which has led to many efforts to subdue the LIHF property of such genomic medicine data. Due to the promising results of the ensemble models in machine learning problems, this paper presents a novel framework, named feature-level aggregation-based ensemble based on overlapped feature subspace partitioning (FLAE-OFSP) for microarray data classification. The proposed ensemble has three main steps: after generating several subsets by the proposed partitioning approach, a feature selection algorithm (i.e., a feature ranker) is applied on each subset, and finally, their results are combined into a single ranked list using six defined aggregation functions. Evaluation of the presented framework based on seven microarray datasets and using four measures, including stability, classification accuracy, runtime, and Modscore shows substantial runtime improvement and also quality results in other evaluated measures compared to individual methods.

Full Text