Abstract
In the classical contamination models, such as the gross-error (Huber and Tukey contamination model or case-wise contamination), observations are considered as the units to be identified as outliers or not. This model is very useful when the number of considered variables is moderately small. Alqallaf et al. (Ann Stat 37(1):311–331, 2009) show the limits of this approach for a larger number of variables and introduced the independent contamination model (cell-wise contamination) where now the cells are the units to be identified as outliers or not. One approach to deal, at the same time, with both type of contamination is filter out the contaminated cells from the data set and then apply a robust procedure able to handle case-wise outliers and missing values. Here, we develop a general framework to build filters in any dimension based on statistical data depth functions. We show that previous approaches, e.g., Agostinelli et al. (TEST 24(3):441–461, 2015b) and Leung et al. (Comput Stat Data Anal 111:59–76, 2017), are special cases. We illustrate our method by using the half-space depth.
Highlights
One of most common problems in real data is the presence of outliers, i.e., observations that are well separated from the bulk of data, that may be errors that affect the data
We compare the filter introduced in Agostinelli et al (2015b) and the same filter with the improvements proposed in Leung et al (2017) to the presented filter based on statistical data depth functions obtained using the half-space depth (HS-UF for the univariate filter, HS-UBF for the univariate-bivariate filter, HS-UBPF for the univariate-bivariate- p-variate filter and HS-UBPF-detect deviating cells (DDC)-C for the combination of the HS-UBPF with the modifications in Leung et al (2017))
7 Conclusions We presented a general idea to construct filters based on statistical data depth functions, called depth filters
Summary
For a point x ∈ Rd , we consider the statistical data depth of x with respect to F be d(x; F), where d(·, F) satisfies the four properties given in Liu (1990) and Zuo and Serfling (2000a) and reported in Section SM-1 of the Supplementary Material. We assume that d(x; Fn) is a uniform consistent estimator of d(x; F), that is, sup |d(x; Fn) − d(x; F)| a→.s. 0 n → ∞, x a property enjoined by many statistical data depth functions, e.g., among others simplicial depth (Liu 1990) and half-space depth (Donoho and Gasko 1992). A filter is said consistent for a given distribution F0 if asymptotically it will not flag any cell if the data come from the true distribution F0. The derivation of this result is shown in Section SM-2 of the Supplementary Material
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.