Hybridization of Base Classifiers of Random Subsample Ensembles for Enhanced Performance in High Dimensional Feature Spaces

Santhosh Pathical,Gursel Serpen

doi:10.1109/icmla.2010.118

Abstract

This paper presents a simulation-based empirical study of the performance profile of random sub sample ensembles with a hybrid mix of base learner composition in high dimensional feature spaces. The performance of hybrid random sub sample ensemble that uses a combination of C4.5, k-nearest neighbor (kNN) and naïve Bayes base learners is assessed through statistical testing in comparison to those of homogeneous random sub sample ensembles that employ only one type of base learner. Simulation study employs five datasets with up to 20K features from the UCI Machine Learning Repository. Random sub sampling without replacement is used to map the original high dimensional feature space of the five datasets to a multiplicity of lower dimensional feature subspaces. The simulation study explores the effect of certain design parameters that include the count of base classifiers and sub sampling rate on the performance of the hybrid random subspace ensemble. The ensemble architecture utilizes the voting combiner in all cases. Simulation results indicate that hybridization of base learners for random sub sample ensemble improves the prediction accuracy rates and projects a more robust performance.

Full Text