Model-assisted estimators have gained significant attention due to their ability to efficiently utilize auxiliary information during the estimation process. These estimators rely on a working model that links the survey variable to the auxiliary variables, which is then fitted to the sample data to generate predictions. These predictions are subsequently integrated into the estimation procedures. In this study, were explores various model-assisted estimators including Generalized Regression (GREG), Ridge regression, Lasso regression, CART (Classification and Regression Tree), Random Forest, Cubist and Principal Components Regression (PCR) estimator. The analysis involved 2,000 samples of size 50 (n/N ≈ 10%) and employed a stepwise variable selection method to determine the most significant auxiliary variables, incrementally adding them to the model. The performance of these estimators was assessed using relative bias (RB), relative root mean square error (RRMSE) and relative efficiency (RE). Our findings reveal that tree-based models like CART and Random Forest and penalized regression estimators such as Ridge and Lasso display robustness with increased number of auxiliary variables. Among all the estimators, Random Forest consistently yielded the lowest RRMSE, particularly with five auxiliary variables, demonstrating superior efficiency. Conversely, the GREG estimator exhibited poor performance as the number of auxiliary variables increased. This study underscores the importance of selecting suitable model-assisted estimation procedures tailored to the data characteristics and the relationship between survey and auxiliary variables within this high-dimensional dataset.
Read full abstract