Abstract

Microbes play a critical role in human health and disease, especially in cities with high population densities. Understanding the microbial ecosystem in an urban environment is essential for monitoring the transmission of infectious diseases and identifying potentially urgent threats. To achieve this goal, researchers have started to collect and analyze metagenomic samples from subway stations in major cities. However, it is too costly and time-consuming to achieve city-wide sampling with fine-grained geo-spatial resolution. In this paper, we present MetaMLAnn, a neural network based approach to infer microbial communities at unmeasured locations, based upon information from various data sources in an urban environment, including subway line information, sampling material, and microbial compositions. MetaMLAnn exploits these heterogeneous features to capture the latent dependencies between microbial compositions and the urban environment, thereby precisely inferring microbial communities at unsampled locations. Moreover, we propose a regularization framework to incorporate the species relatedness as prior knowledge. We evaluate our approach using the public metagenomics dataset collected from multiple subway stations in New York and Boston. The experimental results show that MetaMLAnn consistently outperforms five conventional classifiers across several evaluation metrics. The code, features and labels are available at https://github.com/zgy921028/MetaMLAnn

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call