Abstract

Ensemble methods can be used to identify causal relationships in data for a better understanding and taking the right decision in processes that involve high risk. This paper explores the idea of a causal decision tree forest and proposes a regularized ensemble method by integrating optimal causal trees for improved prediction accuracy while not compromising on accurately estimating heterogeneous treatment effects. The proposed method is based on selecting a subset of the most accurate causal trees from a sufficiently large pool based on their out-of-sample error estimates. The selected trees are integrated to form an ensemble that is used for estimating heterogeneous treatment effect and predicting unseen data. The proposed method is applied on Pakistan’s income function consisting of 27964 observations on wages of workers age 10 and above as an example dataset. The paper gives a detailed simulation study where datasets are generated under 5 different designs. The proposed method is assessed against ordinary least square (OLS), least absolute shrinkage and selection operator (LASSO), Ridge, Causal Tree and the standard decision trees forest (i.e. the causal forest) via mean square error ( <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">MSE</i> ), root mean square error ( <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">RMSE</i> ), mean absolute deviation ( <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">MAD</i> ) and Pearson correlation ( <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">${r}$ </tex-math></inline-formula> ) as performance metrics. The analyses given in the paper reveal that the proposed method can be used effectively for estimating heterogeneous treatment effects and achieves better prediction performance and as compared to the rest of the methods given in the paper.

Highlights

  • The identification of the causal relationships in the data is key to provide a better understanding and the knowledge for taking an accurate decision in processes with risk

  • EXPERIMENTS AND RESULTS In this paper, the proposed optimal causal trees ensemble (OCTE) is assessed using five different simulation scenarios. It is compared with five state-of-the-art methods, i.e., ordinary least square (OLS), least absolute shrinkage and selection operator (LASSO), Ridge, causal tree and causal random forest

  • The OCTE is applied on a real dataset, the nationally representative Labor Force Survey of Pakistan (LFSP)

Read more

Summary

Introduction

The identification of the causal relationships in the data is key to provide a better understanding and the knowledge for taking an accurate decision in processes with risk. Sometimes, the purpose of using machine learning methods could potentially exceed prediction, such as representing and discovering causal relationships in data and estimating heterogeneous causal effects. This kind of application provides a compact and precise graphical representation of the causal relationships between a set of predictor attributes and an outcome attribute. Typical examples include classification and regression trees, k-nearest neighbours models, support vector machines, etc

Objectives
Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.