High-breakdown estimation of multivariate mean and covariance with missing observations.

Tsung‐Chi Cheng,Maria‐Pia Victoria‐Feser

doi:10.1348/000711002760554615

Abstract

We consider the problem of outliers in incomplete multivariate data when the aim is to estimate a measure of mean and covariance, as is the case, for example, in factor analysis. The ER algorithm of Little and Smith which combines the EM algorithm for missing data and a robust estimation step based on an M-estimator could be used in such a situation. However, the ER algorithm as originally proposed can fail to be robust in some cases, especially in high dimensions. We propose here two alternatives to avoid the problem. One is to combine a small modification of the ER algorithm with a so-called high-breakdown estimator as the starting point for the iterative procedure, and the other is to base the estimation step of the ER algorithm on a high-breakdown estimator. Among the high-breakdown estimators which are actually built to keep their robustness properties even if the number of variables is relatively large, we consider here the minimum covariance determinant estimator and the t-biweight S-estimator. Simulated and real data are used to compare and illustrate the different procedures.

Full Text