Abstract

We propose a new ensemble classification algorithm, named super random subspace ensemble (Super RaSE), to tackle the sparse classification problem. The proposed algorithm is motivated by the random subspace ensemble algorithm (RaSE). The RaSE method was shown to be a flexible framework that can be coupled with any existing base classification. However, the success of RaSE largely depends on the proper choice of the base classifier, which is unfortunately unknown to us. In this work, we show that Super RaSE avoids the need to choose a base classifier by randomly sampling a collection of classifiers together with the subspace. As a result, Super RaSE is more flexible and robust than RaSE. In addition to the vanilla Super RaSE, we also develop the iterative Super RaSE, which adaptively changes the base classifier distribution as well as the subspace distribution. We show that the Super RaSE algorithm and its iterative version perform competitively for a wide range of simulated data sets and two real data examples. The new Super RaSE algorithm and its iterative version are implemented in a new version of the R package RaSEn.

Highlights

  • Classification is an important research topic with applications in a wide range of subjects, including finance, engineering, economics, and medicine (Kotsiantis et al 2007)

  • We show a boxplot of the selected proportion of all the noisy features as a way to verify whether the Super random subspace ensemble (RaSE) algorithms can distinguish the important features from the noisy features

  • In this work, motivated by the random subspace ensemble (RaSE) classification, we propose a new ensemble classification framework, named Super RaSE, which is a completely model-free approach

Read more

Summary

Introduction

Classification is an important research topic with applications in a wide range of subjects, including finance, engineering, economics, and medicine (Kotsiantis et al 2007). We observe a collection of observations with p features x and their associate class label y. The most studied problem is the so-called binary classification setup, where we usually have y ∈ {0, 1}. One important application of classification in engineering is to perform traffic classification (Dainotti et al 2012), which could be very beneficial to improve traffic. When the number of features is large, we usually call the classification problem a high-dimensional one. Gao et al (2017) further studies the post selection shrinkage method to improve the classical penalized estimator for high-dimensional classification problem. Some other related work include (Szczygieł et al 2014; Michalski et al 2018; Tong et al 2018; and Dvorsky et al 2021)

Objectives
Methods
Findings
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call