Abstract

Tree-based ensemble algorithms (TEAs) have made significant advances in recent years due to their simple algorithmic design. However, when the proportion of the ‘most informative’ features is low, the performance of conventional TEAs degrades significantly. The primary rationale for performance degradation is that traditional algorithmic design appears to be biased toward the least informative features, and the sub-space selection procedure contains uninformative features. This paper proposes a logically randomized forest (LRF) algorithm by incorporating two different enhancements into existing TEAs. The first enhancement is made to address the issue of biasness by performing feature-level engineering. The second enhancement is the approach by which individual feature sub-spaces are selected. To derive the first enhancement, we use the graph-theoretic principle of minimal vertex cover to construct a relevant assemblage of features. Following that, the permutation-based feature importance technique is employed to calculate the ‘informativeness’ of the relevant features in order to infuse logical randomness into the individual trees in the forest. For the second enhancement, the stratified sampling method is used to ensure that the most informative features are present in all newly created feature sub-spaces. Consequently, individual trees are generated using the Roulette wheel-based selection (RWS) algorithm. The proposed algorithm has been evaluated on two real-world genomic data sets, ten hybrid-synthetic classification data sets, and twenty multidisciplinary benchmark data sets with varying characteristics. The experimental findings demonstrate that the LRF outperforms the existing benchmark and cutting-edge TEAs.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.