Abstract

High-dimensional compositional data, multivariate observations carrying relative information, frequently contain values below a detection limit (rounded zeros). We introduce new model-based procedures for replacing these values with reasonable numbers, so that the completed data set is ready for use with statistical analysis methods that rely on complete data, such as regression or classification with high-dimensional explanatory variables. The procedures respect the geometry of compositional data and can be considered as alternatives to existing methods. Simulations show that especially in high-dimensions, the proposed methods outperform existing methods. Moreover, even for a large number of rounded zeros, the new methods lead to an improved quality of the data, which is important for further analyses. The usefulness of the procedure is demonstrated using a data example from metabolomics.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call