Abstract
MotivationMicrobial communities influence their environment by modifying the availability of compounds, such as nutrients or chemical elicitors. Knowing the microbial composition of a site is therefore relevant to improve productivity or health. However, sequencing facilities are not always available, or may be prohibitively expensive in some cases. Thus, it would be desirable to computationally predict the microbial composition from more accessible, easily-measured features.ResultsIntegrating deep learning techniques with microbiome data, we propose an artificial neural network architecture based on heterogeneous autoencoders to condense the long vector of microbial abundance values into a deep latent space representation. Then, we design a model to predict the deep latent space and, consequently, to predict the complete microbial composition using environmental features as input. The performance of our system is examined using the rhizosphere microbiome of Maize. We reconstruct the microbial composition (717 taxa) from the deep latent space (10 values) with high fidelity (>0.9 Pearson correlation). We then successfully predict microbial composition from environmental variables, such as plant age, temperature or precipitation (0.73 Pearson correlation, 0.42 Bray–Curtis). We extend this to predict microbiome composition under hypothetical scenarios, such as future climate change conditions. Finally, via transfer learning, we predict microbial composition in a distinct scenario with only 100 sequences, and distinct environmental features. We propose that our deep latent space may assist microbiome-engineering strategies when technical or financial resources are limited, through predicting current or future microbiome compositions.Availability and implementationSoftware, results and data are available at https://github.com/jorgemf/DeepLatentMicrobiomeSupplementary information Supplementary data are available at Bioinformatics online.
Highlights
Some microbiome studies go one step further, following a predictive approach. These translational approaches should allow us to design solutions based on microbiome modulation to address problems in human and plant health, and take microbial composition as a predictor of a particular phenotypic feature, using linear regression and machine learning (ML) techniques (Caporaso et al, 2011; Zhou and Gallins, 2019)
3.1 Microbial composition is predicted with high accuracy from environmental features
Given the absence of prior-art or gold standard for this kind of prediction, we compare the accuracy of our models with three baseline models capable of simultaneous regression of multiple variables: a) a default predictor computing the average of all the training samples per each OTU, independently of the mapping features; b) a linear regression model; c) a non-linear model, i.e. a Multi-Layer Perceptron (MLP)
Summary
The microbiome: Microbes are everywhere, in human, animals, plants and the environment (soil, water, air), executing numerous biological functions whose absence would dramatically reduce the quality and quantity of life on earth (Gilbert and Neufeld, 2014). Microbial community functions include collaborating in carbon and nitrogen cycles, to provide nutrients to animal and plant cells by breaking complex molecules into smaller compounds, training and triggering the immune system to fight against pathogens, etc. Those microbiome functions entail applications in health and medicine, climate change, sustainable agriculture, environment and biofuels. Some microbiome studies go one step further, following a predictive approach These translational approaches should allow us to design solutions based on microbiome modulation to address problems in human and plant health, and take microbial composition as a predictor of a particular phenotypic feature, using linear regression and machine learning (ML) techniques (Caporaso et al, 2011; Zhou and Gallins, 2019)
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have