Abstract

We describe imputation strategies resistant to outliers, through modifications of the simple imputation method proposed by Krzanowski and assess their performance. The strategies use a robust singular value decomposition, do not depend on distributional or structural assumptions and have no restrictions as to the pattern or missing data mechanisms. They are tested through the simulation of contamination and unbalance, both in artificially generated matrices and in a matrix of real data from an experiment with genotype-by-environment interaction. Their performance is assessed by means of prediction errors, the squared cosine between matrices, and a quality coefficient of fit between imputations and true values. For small matrices, the best results are obtained by applying robust decomposition directly, while for larger matrices the highest quality is obtained by eliminating the singular values of the imputation equation.

Highlights

  • College of Engineering, Mathematics and Physical Sciences, University of Exeter, Exeter EX4 4QF, UK; Abstract: We describe imputation strategies resistant to outliers, through modifications of the simple imputation method proposed by Krzanowski and assess their performance

  • SVD88 is highly affected by contamination, producing low quality imputations

  • We have focused on nonparametric imputation based on the singular value (SVD) decomposition [10,11]

Read more

Summary

Introduction

College of Engineering, Mathematics and Physical Sciences, University of Exeter, Exeter EX4 4QF, UK; Abstract: We describe imputation strategies resistant to outliers, through modifications of the simple imputation method proposed by Krzanowski and assess their performance. The strategies use a robust singular value decomposition, do not depend on distributional or structural assumptions and have no restrictions as to the pattern or missing data mechanisms. They are tested through the simulation of contamination and unbalance, both in artificially generated matrices and in a matrix of real data from an experiment with genotype-by-environment interaction. Modern literature on incomplete information analysis recommends completing matrices using methods that employ either maximum likelihood or multiple imputation [5] These procedures can depend heavily on probability distributions (for example, multivariate normal), and on the missing data mechanisms [6]

Objectives
Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.