Abstract

Microbiome and omics datasets are, by their intrinsic biological nature, of high dimensionality, characterized by counts of large numbers of components (microbial genes, operational taxonomic units, RNA transcripts, etc.). These data are generally regarded as compositional since the total number of counts identified within a sample is irrelevant. The central concept in compositional data analysis is the logratio transformation, the simplest being the additive logratios with respect to a fixed reference component. A full set of additive logratios is not isometric, that is they do not reproduce the geometry of all pairwise logratios exactly, but their lack of isometry can be measured by the Procrustes correlation. The reference component can be chosen to maximize the Procrustes correlation between the additive logratio geometry and the exact logratio geometry, and for high-dimensional data there are many potential references. As a secondary criterion, minimizing the variance of the reference component's log-transformed relative abundance values makes the subsequent interpretation of the logratios even easier. On each of three high-dimensional omics datasets the additive logratio transformation was performed, using references that were identified according to the abovementioned criteria. For each dataset the compositional data structure was successfully reproduced, that is the additive logratios were very close to being isometric. The Procrustes correlations achieved for these datasets were 0.9991, 0.9974, and 0.9902, respectively. We thus demonstrate, for high-dimensional compositional data, that additive logratios can provide a valid choice as transformed variables, which (a) are subcompositionally coherent, (b) explain 100% of the total logratio variance and (c) come measurably very close to being isometric. The interpretation of additive logratios is much simpler than the complex isometric alternatives and, when the variance of the log-transformed reference is very low, it is even simpler since each additive logratio can be identified with a corresponding compositional component.

Highlights

  • The Frontiers in Microbiology article by Gloor et al (2017) is emphatically titled: “Microbiome datasets are compositional: and this is not optional.” We agree

  • how close the ALR transformation is to being isometric, Figure 5 shows the between-sample distances computed on the

  • ALRs plotted against the exact logratio distances

Read more

Summary

Introduction

The Frontiers in Microbiology article by Gloor et al (2017) is emphatically titled: “Microbiome datasets are compositional: and this is not optional.” We agree. The number of socalled reads obtained by high throughput sequencing varies from sample to sample and is of no relevance to the investigation, much the same as the size of a rock is irrelevant to the study of its geochemical composition It is the relative values of the read counts that are the data of interest, making the data strictly compositional (Fernandes et al, 2014). It is convenient to eliminate the effect of the sample totals by normalizing, or closing, the data, so that sample values sum to 1—these vectors of non-negative sample values with constant sums are called compositions Once this initial step is made, the question remains how to analyze, relate and interpret the components of the compositions, be they microbial genes, operational taxonomic units, transcripts or metabolites. This is the first and most fundamental step in the pipeline for analyzing compositional data

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call