Generative models are extremely popular in modern biology. They have been used to model the variation of protein sequences, entire genomes, and RNA sequencing profiles. Importantly, generative models have been used to extrapolate and interpolate to unobserved regimes of data to design biological systems with desired properties. For example, there has been a boom in machine-learning models aiding in the design of proteins with user-specified structures or functions. Host-associated microbiomes play important roles in animal health and disease, as well as the productivity and environmental footprint of livestock species. However, there are no generative models of host-associated microbiomes. One chief reason is that off-the-shelf machine-learning models are data hungry, and microbiome studies usually deal with large variability and small sample sizes. Moreover, microbiome compositions are heavily context dependent, with characteristics of the host and the abiotic environment leading to distinct patterns in host-microbiome associations. Consequently, off-the-shelf generative modeling has not been successfully applied to microbiomes.To address these challenges, we develop a generative model for host-associated microbiomes derived from the consumer/resource (C/R) framework. This derivation allows us to fit the model to readily available cross-sectional microbiome profile data. Using data from three animal hosts, we show that this mechanistic generative model has several salient features: the model identifies a latent space that represents variables that determine the growth and, therefore, relative abundances of microbial species. Probabilistic modeling of variation in this latent space allows us to generate realistic in silico microbial communities. The model can assign probabilities to microbiomes, thereby allowing us to discriminate between dissimilar ecosystems. Importantly, the model predictively captures host-associated microbiomes and the corresponding hosts' phenotypes, enabling the design of microbial communities associated with user-specified host characteristics.
Read full abstract