Random Ordinality Ensembles: Ensemble methods for multi-valued categorical data

Amir Ahmad,Gavin Brown

doi:10.1016/j.ins.2014.10.064

Abstract

Data with multi-valued categorical attributes can cause major problems for decision trees. The high branching factor can lead to data fragmentation, where decisions have little or no statistical support. In this paper, we propose a new ensemble method, Random Ordinality Ensembles (ROE), that reduces this problem, and provides significantly improved accuracies over current ensemble methods. We perform a random projection of the categorical data into a continuous space. As the transformation to continuous data is a random process, each dataset has a different imposed ordinality. A decision tree that learns on this new continuous space is able to use binary splits, hence reduces the data fragmentation problem. Generally, these binary trees are accurate. Diverse training datasets ensure diverse decision trees in the ensemble. We created two variants of the technique, ROE. In the first variant, we used decision trees as the base models for ensembles. In the second variant, we combined the attribute randomisation of Random Subspaces with Random Ordinality. These methods match or outperform other popular ensemble methods. Different properties of these ensembles were studied. The study suggests that random ordinality trees are generally more accurate and smaller than multi-way split decision trees. It is also shown that random ordinality attributes can be used to improve Bagging and AdaBoost.M1 ensemble methods.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Random Ordinality Ensembles: Ensemble methods for multi-valued categorical data

Abstract

Talk to us

Similar Papers

More From: Information Sciences

Lead the way for us

Journal: Information Sciences	Publication Date: Nov 5, 2014
Citations: 9

Similar Papers

Random Ordinality Ensembles $\colon$ A Novel Ensemble Method for Multi-valued Categorical Data
Amir Ahmad ... Gavin Brown
-
Amir Ahmad, et. al.Amir Ahmad ... Gavin Brown
01 Jan 2009
01 Jan 2009

Random Projection Random Discretization Ensembles—Ensembles of Linear Multivariate Decision Trees
Amir Ahmad ... Gavin Brown
IEEE Transactions on Knowledge and Data Engineering | VOL. 26
Amir Ahmad, et. al.Amir Ahmad ... Gavin Brown
01 May 2014
IEEE Transactions on Knowledge and Data Engineering | VOL. 26

Prediction of diabetic protein markers based on an ensemble method
...
Frontiers in Bioscience-Landmark | VOL. 26
, et. al. ...
01 Jan 2020
Frontiers in Bioscience-Landmark | VOL. 26

Detecting Chronic Kidney Disease Using Machine Learning
Manoj Reddy ... John Cho
-
Manoj Reddy, et. al.Manoj Reddy ... John Cho
01 Jan 2015
01 Jan 2015

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Random Ordinality Ensembles: Ensemble methods for multi-valued categorical data

Abstract

Talk to us

Similar Papers

More From: Information Sciences