ABSTRACTRecent advances in data complexity and availability present both challenges and opportunities for automated data exploration. Tree‐based methods, known for their interpretability, are widely used for building regression and classification models. However, they often lag behind the best supervised learning approaches in terms of prediction accuracy. To address this limitation, ensemble methods, such as random forests, combine multiple trees to improve prediction accuracy, though at the cost of interpretability. While tree‐based methods have seen extensive use in various fields, their application in the context of complex survey data has been relatively limited. This article provides an overview of the state‐of‐the‐art tree‐based approaches for analyzing complex survey data. It distinguishes methods explicitly designed for survey contexts from those adapted from other domains. The discussion covers applications in model‐assisted approaches, disclosure limitation, and small area estimation, as well as other recent methodological developments tailored to survey data. Additionally, the article explores aggregated tree models that sacrifice interpretability for improved prediction accuracy. These models, such as Bagging, Random Forests, and Boosting, are explained, along with the concept of out‐of‐bag error for model evaluation. Finally, this article provides the history and development of tree models, from their origins in regression trees to more recent Bayesian approaches, and aggregated tree models. This overview sheds light on the potential utility of tree‐based methods in survey methodology and provides insights into future research directions in this evolving field.
Read full abstract