Abstract

Existing ordinal trees and random forests typically use scores that are assigned to the ordered categories, which implies that a higher scale level is used. Versions of ordinal trees are proposed that take the scale level seriously and avoid the assignment of artificial scores. The construction principle is based on an investigation of the binary models that are implicitly used in parametric ordinal regression. These building blocks can be fitted by trees and combined in a similar way as in parametric models. The obtained trees use the ordinal scale level only. Since binary trees and random forests are constituent elements of the proposed trees, one can exploit the wide range of binary trees that have already been developed. A further topic is the potentially poor performance of random forests, which seems to have been neglected in the literature. Ensembles that include parametric models are proposed to obtain prediction methods that tend to perform well in a wide range of settings. The performance of the methods is evaluated empirically by using several data sets.

Highlights

  • There is a long tradition of analyzing ordinal response data by using parametric models, which started with the seminal paper of McCullagh (1980)

  • In the retinopathy data set, with explanatory variables smoking (SM = 1: smoker, SM = 0: non-smoker), diabetes duration (DIAB) measured in years, glycosylated hemoglobin (GH), measured in percent, and diastolic blood pressure (BP) measured in mmHg, one obtains for the two splits the trees shown in Fig. 1 (fitted by using ctree, Hothorn et al (2006))

  • While the use of parametric models in ensembles seems to have been neglected, there are several proposals how to form ensembles from tress (see, for example, the weighted random forests proposed by Winham et al (2013) and the ensembles considered by Khan et al (2020))

Read more

Summary

Introduction

There is a long tradition of analyzing ordinal response data by using parametric models, which started with the seminal paper of McCullagh (1980). The assignment of scores can be warranted in some cases, in particular if ordinal responses are built from continuous variables by grouping. It is rather artificial and arbitrary in genuine ordinal response data, for example, if the response represents ordered levels of severeness of a disease. Versions of random forests without scores were proposed more recently by Buri and Hothorn (2020) They use the ordinal proportional odds model to obtain statistics that are used in splitting. In the following we propose the use of ensembles that include parametric models to provide a stable prediction tool that works well in all kinds of data sets. The paper has two objectives, introducing score-free recursive partitioning and random forests, and proposing ensembles that include parametric models.

Binary Representations of Ordinal Responses
Recursive Partitioning Based on Splits
Trees for Split Variables
Trees for Conditional Splits
From Trees to Random Forests
Ensemble Learners Including Parametric Models
Measuring Accuracy of Prediction
Heart Data
Wine Data
Housing Data
Birth Weight Data
Retinopathy Data
Medical Care
GLES Data
Safety Data
Ensembles at Work
Importance of Variables
Findings
Concluding Remarks
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.