Abstract
We describe imputation strategies resistant to outliers, through modifications of the simple imputation method proposed by Krzanowski and assess their performance. The strategies use a robust singular value decomposition, do not depend on distributional or structural assumptions and have no restrictions as to the pattern or missing data mechanisms. They are tested through the simulation of contamination and unbalance, both in artificially generated matrices and in a matrix of real data from an experiment with genotype-by-environment interaction. Their performance is assessed by means of prediction errors, the squared cosine between matrices, and a quality coefficient of fit between imputations and true values. For small matrices, the best results are obtained by applying robust decomposition directly, while for larger matrices the highest quality is obtained by eliminating the singular values of the imputation equation.
Highlights
College of Engineering, Mathematics and Physical Sciences, University of Exeter, Exeter EX4 4QF, UK; Abstract: We describe imputation strategies resistant to outliers, through modifications of the simple imputation method proposed by Krzanowski and assess their performance
SVD88 is highly affected by contamination, producing low quality imputations
We have focused on nonparametric imputation based on the singular value (SVD) decomposition [10,11]
Summary
College of Engineering, Mathematics and Physical Sciences, University of Exeter, Exeter EX4 4QF, UK; Abstract: We describe imputation strategies resistant to outliers, through modifications of the simple imputation method proposed by Krzanowski and assess their performance. The strategies use a robust singular value decomposition, do not depend on distributional or structural assumptions and have no restrictions as to the pattern or missing data mechanisms. They are tested through the simulation of contamination and unbalance, both in artificially generated matrices and in a matrix of real data from an experiment with genotype-by-environment interaction. Modern literature on incomplete information analysis recommends completing matrices using methods that employ either maximum likelihood or multiple imputation [5] These procedures can depend heavily on probability distributions (for example, multivariate normal), and on the missing data mechanisms [6]
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.