Abstract

One of the most essential topics in robust statistics is the robust estimation of location and covariance. Many popular robust (location and scatter) estimators such as Fast-MCD, MVE, and MZE require at least a convex distribution of the underlying data. In the case of non-convex data distributions these approaches may lead to a suboptimal result caused by the application of Mahalanobis distances with respect to location and covariance of a suitably chosen subsample of the data—implying a convex structure. The approach presented here fixes this drawback using Euclidean distances. The data set is treated as a complete network and the minimum spanning tree (MST) for this data set is calculated. Based on the MST a subset of relevant points (thought of as an “outlier-free” subsample of minimum size) is determined which can then be used for calculating data characteristics. It is shown, that the approach has a maximum breakdown point. Additionally, a simulation study provides insights in the approach’s behaviour with respect to increasing dimension and size.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call