Complex matrices such as soil have a range of measurable characteristics, and thus data to describe them can be considered multidimensional. These characteristics can be strongly influenced by factors that introduce confounding effects that hinder analyses. Traditional statistical approaches lack the flexibility and granularity required to adequately evaluate such matrices, particularly those with large dataset of varying data types (i.e. quantitative non-compositional, quantitative compositional). We present a statistical workflow designed to effectively analyse complex, multidimensional systems, even in the presence of confounding variables. The developed methodology involves exploratory analysis to identify the presence of confounding variables, followed by data decomposition (including strategies for both compositional and non-compositional quantitative data) to minimise the influence of these confounding factors such as sampling site/location. These data processing methods then allow for common patterns to be highlighted in the data, including the identification of biomarkers and determination of non-trivial associations between variables. We demonstrate the utility of this statistical workflow by jointly analysing the chemical composition and fungal biodiversity of New Zealand vineyard soils that have been managed with either organic low-input or conventional input approaches. By applying this pipeline, we were able to identify biomarkers that distinguish viticultural soil from both approaches and also unearth links and associations between the chemical and metagenomic profiles. While soil is an example of a system that can require this type of statistical methodology, there are a range of biological and ecological systems that are challenging to analyse due to the complex interplay of global and local effects. Utilising our developed pipeline will greatly enhance the way that these systems can be studied and the quality and impact of insight gained from their analysis.
Read full abstract