Abstract
It is well known that if a multivariate outlier has one or more missing component values, then multiple imputation (MI) methods tend to impute nonextreme values and make the outlier become less extreme and less likely to be detected. In this paper, nonparametric depth-based multivariate outlier identifiers are used as criteria in a numerical study comparing several established methods of MI as well as a new proposed one, nine in all, in a setting of several actual clinical laboratory data sets of different dimensions. Two criteria, an ‘outlier recovery probability’ and a ‘relative accuracy measure’, are developed, based on depth functions. Three outlier identifiers, based on Mahalanobis distance, robust Mahalanobis distance, and generalized principle component analysis are also included in the study. Consequently, not only the comparison of imputation methods but also the comparison of outlier detection methods is accomplished in this study. Our findings show that the performance of an MI method depends on the choice of depth-based outlier detection criterion, as well as the size and dimension of the data and the fraction of missing components. By taking these features into account, an MI method for a given data set can be selected more optimally.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.