Ensemble of optimal trees, random forest and random projection ensemble classification

Zardad Khan,Berthold Lausen,Werner Adler,Osama Mahmoud,Aris Perperoglou,Miftahuddin Miftahuddin,Asma Gul

doi:10.1007/s11634-019-00364-9

Abstract

The predictive performance of a random forest ensemble is highly associated with the strength of individual trees and their diversity. Ensemble of a small number of accurate and diverse trees, if prediction accuracy is not compromised, will also reduce computational burden. We investigate the idea of integrating trees that are accurate and diverse. For this purpose, we utilize out-of-bag observations as a validation sample from the training bootstrap samples, to choose the best trees based on their individual performance and then assess these trees for diversity using the Brier score on an independent validation sample. Starting from the first best tree, a tree is selected for the final ensemble if its addition to the forest reduces error of the trees that have already been added. Our approach does not use an implicit dimension reduction for each tree as random project ensemble classification. A total of 35 bench mark problems on classification and regression are used to assess the performance of the proposed method and compare it with random forest, random projection ensemble, node harvest, support vector machine, kNN and classification and regression tree. We compute unexplained variances or classification error rates for all the methods on the corresponding data sets. Our experiments reveal that the size of the ensemble is reduced significantly and better results are obtained in most of the cases. Results of a simulation study are also given where four tree style scenarios are considered to generate data sets with several structures.

Highlights

Various authors have suggested that combining weak models leads to efficient ensembles (Schapire 1990; Domingos 1996; Quinlan 1996; Maclin and Opitz 2011; Hothorn and Lausen 2003; Janitza et al 2015; Gul et al 2016b; Lausser et al 2016; BolónCanedo et al 2012; Bhardwaj et al 2016; Liberati et al 2017)
A total of T = 1500 independent classification and regression trees are grown on bootstrap samples from 90% of training data along with randomly selecting p features for splitting the nodes of the trees
In the case of regression problems, our method is giving better results than the other methods considered on 7 data sets out of a total of 14 data sets, whereas on 2 data sets, Wine and Abalone, random forest gives the best performance

Summary

Introduction

Various authors have suggested that combining weak models leads to efficient ensembles (Schapire 1990; Domingos 1996; Quinlan 1996; Maclin and Opitz 2011; Hothorn and Lausen 2003; Janitza et al 2015; Gul et al 2016b; Lausser et al 2016; BolónCanedo et al 2012; Bhardwaj et al 2016; Liberati et al 2017). Ensemble methods are effective in that different types of models have different inductive biases where such diversity reduces variance-error while not increasing the bias error (Mitchell 1997; Tumer and Ghosh 1996; Ali and Pazzani 1996). Extending this notion, Breiman (2001) suggested growing a large number, T for instance, of classification and regression trees. K ) in cases of regression and classification, respectively Breiman called this method bagging and using random selections of features at each node random forest (Breiman 2001)

Methods

Findings

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Advances in Data Analysis and Classification	Publication Date: Jun 12, 2019
Citations: 44	License type: open-access

R Discovery Prime

R Discovery Prime

Ensemble of optimal trees, random forest and random projection ensemble classification

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Advances in Data Analysis and Classification

Lead the way for us

Similar Papers

Classification of Virtual Harassment on Social Networks Using Ensemble Learning Techniques
Nureni Ayofe Azeez ... Emad Fadhal
Applied Sciences | VOL. 13
Nureni Ayofe Azeez, et. al.Nureni Ayofe Azeez ... Emad Fadhal
04 Apr 2023
Applied Sciences | VOL. 13

An Ensemble of Optimal Trees for Class Membership Probability Estimation
Zardad Khan ... Werner Adler
-
Zardad Khan, et. al.Zardad Khan ... Werner Adler
01 Jan 2015
01 Jan 2015

Optimal Additive C-Fuzzy Regression Trees for Software Development Effort Prediction
Assia Najm ... Abdelali Zakrani
-
Assia Najm, et. al.Assia Najm ... Abdelali Zakrani
06 Oct 2022
06 Oct 2022

Detecting MLC modeling errors using radiomics-based machine learning in patient-specific QA with an EPID for intensity-modulated radiation therapy.
Madoka Sakai ... Satoru Utsunomiya
Medical Physics | VOL. 48
Madoka Sakai, et. al.Madoka Sakai ... Satoru Utsunomiya
27 Jan 2021
Medical Physics | VOL. 48

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Ensemble of optimal trees, random forest and random projection ensemble classification

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Advances in Data Analysis and Classification