Abstract

Following consistently the principles of compositional data analysis has serious impacts for distributional modeling and statistical processing in general. Particularly, due to the lack of scale invariance, the known Dirichlet distribution is no longer the “must” as the underlying distribution of compositions. It is rather preferred to make use of the concept of normal distribution on the simplex, because the appropriateness of the distribution can be verified by using a standard normality test in coordinates, and the parameters are easy to interpret. Consequently, it can be utilized as the underlying distribution for a wide range of popular methods and tests, including Hotelling tests and MANOVA models in any orthonormal coordinate representation. Because compositional data frequently contain outliers, data inconsistencies, rounding effects, dependencies among the observations, etc., it is recommendable to apply robust counterparts to classical methods in practice. Either univariate or multivariate robust statistical processing can be performed, based on such logratio coordinate representation that serves the purpose of the analysis. Even the classical estimators of location and scale, the sample mean and the sample covariance matrix, are highly sensitive to outliers. As robust alternatives, affine equivariant estimators (like the MCD estimator) are preferred as they can be computed in any coordinate representation. Robust estimators of location and scale can then be used to compute Mahalanobis distances in order to identify multivariate outliers.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call