Abstract
The existence of outliers in a data set and how to deal with them is an important problem in statistics. The minimum volume ellipsoid (MVE) estimator is a robust estimator of location and covariate structure; however its use has been limited because there are few computationally attractive methods. Determining the MVE consists of two parts—finding the subset of points to be used in the estimate and finding the ellipsoid that covers this set. This article addresses the first problem. Our method will also allow us to compute the minimum covariance determinant (MCD) estimator. The proposed method of subset selection is called the effective independence distribution (EID) method, which chooses the subset by minimizing determinants of matrices containing the data. This method is deterministic, yielding reproducible estimates of location and scatter for a given data set. The EID method of finding the MVE is applied to several regression data sets where the true estimate is known. Results show that the EID method, when applied to these data sets, produces the subset of data more quickly than conventional procedures and that there is less than 6% relative error in the estimates. We also give timing results illustrating the feasibility of our method for larger data sets. For the case of 10,000 points in 10 dimensions, the compute time is under 25 minutes.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Similar Papers
More From: Journal of Computational and Graphical Statistics
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.