Robust estimation of location and scatter by pruning the minimum spanning tree

Thomas Kirschstein,Steffen Liebscher,Claudia Becker

doi:10.1016/j.jmva.2013.05.004

Thomas Kirschstein, Steffen Liebscher + Show 1 more

Open Access

https://doi.org/10.1016/j.jmva.2013.05.004

Copy DOI

Abstract

One of the most essential topics in robust statistics is the robust estimation of location and covariance. Many popular robust (location and scatter) estimators such as Fast-MCD, MVE, and MZE require at least a convex distribution of the underlying data. In the case of non-convex data distributions these approaches may lead to a suboptimal result caused by the application of Mahalanobis distances with respect to location and covariance of a suitably chosen subsample of the data—implying a convex structure. The approach presented here fixes this drawback using Euclidean distances. The data set is treated as a complete network and the minimum spanning tree (MST) for this data set is calculated. Based on the MST a subset of relevant points (thought of as an “outlier-free” subsample of minimum size) is determined which can then be used for calculating data characteristics. It is shown, that the approach has a maximum breakdown point. Additionally, a simulation study provides insights in the approach’s behaviour with respect to increasing dimension and size.

Full Text