Abstract
Data with multi-valued categorical attributes can cause major problems for decision trees. The high branching factor can lead to data fragmentation, where decisions have little or no statistical support. In this paper, we propose a new ensemble method, Random Ordinality Ensembles (ROE), that reduces this problem, and provides significantly improved accuracies over current ensemble methods. We perform a random projection of the categorical data into a continuous space. As the transformation to continuous data is a random process, each dataset has a different imposed ordinality. A decision tree that learns on this new continuous space is able to use binary splits, hence reduces the data fragmentation problem. Generally, these binary trees are accurate. Diverse training datasets ensure diverse decision trees in the ensemble. We created two variants of the technique, ROE. In the first variant, we used decision trees as the base models for ensembles. In the second variant, we combined the attribute randomisation of Random Subspaces with Random Ordinality. These methods match or outperform other popular ensemble methods. Different properties of these ensembles were studied. The study suggests that random ordinality trees are generally more accurate and smaller than multi-way split decision trees. It is also shown that random ordinality attributes can be used to improve Bagging and AdaBoost.M1 ensemble methods.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.