Regression Models for Compositional Data: General Log-Contrast Formulations, Proximal Optimization, and Microbiome Data Applications

Patrick L Combettes,Christian L Müller

doi:10.1007/s12561-020-09283-2

Abstract

Compositional data sets are ubiquitous in science, including geology, ecology, and microbiology. In microbiome research, compositional data primarily arise from high-throughput sequence-based profiling experiments. These data comprise microbial compositions in their natural habitat and are often paired with covariate measurements that characterize physicochemical habitat properties or the physiology of the host. Inferring parsimonious statistical associations between microbial compositions and habitat- or host-specific covariate data is an important step in exploratory data analysis. A standard statistical model linking compositional covariates to continuous outcomes is the linear log-contrast model. This model describes the response as a linear combination of log-ratios of the original compositions and has been extended to the high-dimensional setting via regularization. In this contribution, we propose a general convex optimization model for linear log-contrast regression which includes many previous proposals as special cases. We introduce a proximal algorithm that solves the resulting constrained optimization problem exactly with rigorous convergence guarantees. We illustrate the versatility of our approach by investigating the performance of several model instances on soil and gut microbiome data analysis tasks.

Highlights

Compositional data sets are ubiquitous in many areas of science, spanning such disparate fields as geology and ecology
The quantification of the relative abundances of operational taxonomic units (OTUs) in the environment is often accompanied by measurements of other covariates, including physicochemical properties of the underlying habitats, variables related to the health status of the host, or those coming from other highthroughput protocols, such as metabolomics or flow cytometry
In the low-dimensional setting, the standard log-contrast model with zero-sum constraints can be estimated by solving a least-squares problem subject to a linear constraint, or alternatively, via standard linear regression applied to isometrically log-ratio transformed compositions [14]

Summary

Introduction

Compositional data sets are ubiquitous in many areas of science, spanning such disparate fields as geology and ecology. Compositional data arise from high-throughput sequence-based microbiome profiling techniques, such as targeted amplicon sequencing (TAS) and metagenomic profiling These methods generate large-scale genomic survey data of microbial community compositions in their natural habitat, ranging from marine ecosystems to host-associated environments. A particular feature of our model is the joint estimation of regression vectors and associated scales for log-contrast models, similar to the scaled Lasso approach in high-dimensional linear regression [31]. This is achieved by leveraging recent results on the connection between perspective functions and statistical models [8,9,10]. We illustrate the versatility of our modeling approach by applying novel log-contrast model instances to environmental and gut microbiome data analysis tasks

Statistical Log‐Contrast Data Formation Model

Sparse Log‐Contrast Regression

Sparse Log‐Contrast Regression with Side Information

Robust Log‐Contrast Regression

Optimization of General Log‐Contrast Models

Convex Optimization Model

Algorithm

Model Selection

Applications to Compositional Microbiome Data

Body Mass Index Prediction from Gut Microbiome Data

Relationship Between Soil Microbiome and pH Concentration

Discussion and Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Statistics in Biosciences	Publication Date: Jun 19, 2020
Citations: 22	License type: open-access

R Discovery Prime

R Discovery Prime

Regression Models for Compositional Data: General Log-Contrast Formulations, Proximal Optimization, and Microbiome Data Applications

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Statistics in Biosciences

Lead the way for us

Similar Papers

Correlation and association analyses in microbiome study integrating multiomics in health and disease.
Yinglin Xia
Progress in molecular biology and translational science | VOL. 171
Yinglin XiaYinglin Xia
01 Jan 2020
Progress in molecular biology and translational science | VOL. 171

Comparison of zero replacement strategies for compositional data with large numbers of zeros
Sugnet Lubbe ... Matthias Templ
Chemometrics and Intelligent Laboratory Systems | VOL. 210
Sugnet Lubbe, et. al.Sugnet Lubbe ... Matthias Templ
12 Jan 2021
Chemometrics and Intelligent Laboratory Systems | VOL. 210

Creation of a composite solar ultraviolet irradiance data set
Matthew T Deland ... Richard P Cebula
Journal of Geophysical Research: Space Physics | VOL. 113
Matthew T Deland, et. al.Matthew T Deland ... Richard P Cebula
01 Nov 2008
Journal of Geophysical Research: Space Physics | VOL. 113

Bayesian-multiplicative treatment of count zeros in compositional data sets
Josep-Antoni Martín-Fernández ... Peter Filzmoser
Statistical Modelling | VOL. 15
Josep-Antoni Martín-Fernández, et. al.Josep-Antoni Martín-Fernández ... Peter Filzmoser
28 Sep 2014
Statistical Modelling | VOL. 15

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Regression Models for Compositional Data: General Log-Contrast Formulations, Proximal Optimization, and Microbiome Data Applications

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Statistics in Biosciences