Log-ratio analysis of microbiome data with many zeroes is library size dependent.

Dennis E Te Beest,Tim W R Möhlmann,Els H Nijhuis,Cajo J F Ter Braak

doi:10.1111/1755-0998.13391

Dennis E Te Beest, Tim W R Möhlmann + Show 2 more

Open Access

https://doi.org/10.1111/1755-0998.13391

Copy DOI

Journal: Molecular Ecology Resources	Publication Date: May 3, 2021
Citations: 10	License type: CC BY 4.0

Affiliation: Wageningen University & Research

Abstract

Microbiome composition data collected through amplicon sequencing are count data on taxa in which the total count per sample (the library size) is an artefact of the sequencing platform, and as a result, such data are compositional. To avoid library size dependency, one common way of analysing multivariate compositional data is to perform a principal component analysis (PCA) on data transformed with the centred log‐ratio, hereafter called a log‐ratio PCA. Two aspects typical of amplicon sequencing data are the large differences in library size and the large number of zeroes. In this study, we show on real data and by simulation that, applied to data that combine these two aspects, log‐ratio PCA is nevertheless heavily dependent on the library size. This leads to a reduction in power when testing against any explanatory variable in log‐ratio redundancy analysis. If there is additionally a correlation between the library size and the explanatory variable, then the type 1 error becomes inflated. We explore putative solutions to this problem.

Highlights

Microbiome composition data collected through amplicon sequencing are count data on taxa in which the total count per sample is a technical, ill-understood artefact, which carries no biological information, and as a result, such data are compositional
Some people have advocated the use of compositional data analyses in analysing such data (Gloor et al, 2017; Tsilimigras & Fodor, 2016). This implies transforming the data with the centred log-ratio transformation followed by a standard least-squares method such as principal component analysis (PCA)
Two aspects typical for amplicon sequencing data complicate the use of log-ratio PCA: the high amount of zeroes combined with a large variability in the library size

Summary

| INTRODUCTION

Microbiome composition data collected through amplicon sequencing are count data on taxa in which the total count per sample (the library size) is a technical, ill-understood artefact, which carries no biological information, and as a result, such data are compositional. Some people have advocated the use of compositional data analyses in analysing such data (Gloor et al, 2017; Tsilimigras & Fodor, 2016) For multivariate analysis, this implies transforming the data with the centred log-ratio transformation (clr) followed by a standard least-squares method such as principal component analysis (PCA). The data (counts or proportions) are logarithmically transformed and double-centred, followed by a PCA. This is often called log-ratio PCA or log-ratio analysis (Aitchison, 1983; Greenacre, 2018). Two aspects typical for amplicon sequencing data complicate the use of log-ratio PCA: the high amount of zeroes combined with a large variability in the library size.

| MATERIALS AND METHODS

| RESULTS

| DISCUSSION

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Log-ratio analysis of microbiome data with many zeroes is library size dependent.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Molecular Ecology Resources

Lead the way for us

Similar Papers

Author response: Limitations of principal components in quantitative genetic association models for human studies
Yiqi Yao ... Alejandro Ochoa
-
Yiqi Yao, et. al.Yiqi Yao ... Alejandro Ochoa
25 Apr 2023
25 Apr 2023

Decision letter: Limitations of principal components in quantitative genetic association models for human studies
Magnus Nordborg ... Detlef Weigel
-
Magnus Nordborg, et. al.Magnus Nordborg ... Detlef Weigel
04 Jul 2022
04 Jul 2022

Editor's evaluation: Limitations of principal components in quantitative genetic association models for human studies
Magnus Nordborg
-
Magnus NordborgMagnus Nordborg
04 Jul 2022
04 Jul 2022

Normalization and microbial differential abundance strategies depend upon data characteristics
Sophie Weiss ... Kyle Bittinger
Microbiome | VOL. 5
Sophie Weiss, et. al.Sophie Weiss ... Kyle Bittinger
03 Mar 2017
Microbiome | VOL. 5

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Log-ratio analysis of microbiome data with many zeroes is library size dependent.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Molecular Ecology Resources