Multiple imputation of missing values is a key step in data analytics and a standard process in data science. Nonlinear imputation methods come into play whenever the linear relationship between a response and predictors cannot be linearized by transformations of variables, adding interactions, or using, e.g., quadratic terms. Generalized additive models (GAM) and its extension, GAMLSS—where each parameter of the distribution, such as mean, variance, skewness, and kurtosis, can be represented as a function of predictors, are widely used nonlinear methods. However, non-robust methods such as standard GAM’s and GAMLSS’s can be swayed by outliers, leading to outlier-driven imputations. This can apply concerning both representative outliers—those true yet unusual values of your population—and non-representative outliers, which are mere measurement errors. Robust (imputation) methods effectively manage outliers and exhibit resistance to their influence, providing a more reliable approach to dealing with missing data. The innovative solution of the proposed new imputation algorithm tackles three major challenges related to robustness. (1) A robust bootstrap method is employed to handle model uncertainty during the imputation of a random sample. (2) The approach incorporates robust fitting techniques to enhance accuracy. (3) It effectively considers imputation uncertainty in a resilient manner. Furthermore, any complex model for any variable with missingness can be considered and run through the algorithm. For the real-world data sets used and the simulation study conducted, the novel algorithm imputeRobust which includes robust methods for imputation with GAM’s demonstrates superior performance compared to existing imputation methods using GAMLSS. Limitations pertain to the imputation of categorical variables using robust techniques.
Read full abstract