Abstract

Many diseases are driven by gene-environment interactions. One important environmental factor is the metabolic output of human gut microbiota. A comprehensive catalog of human metabolites originated in microbes is critical for data-driven approaches to understand how microbial metabolism contributes to human health and diseases. Here we present a novel integrated approach to automatically extract and analyze microbial metabolites from 28 million published biomedical records. First, we classified 28,851,232 MEDLINE records into microbial metabolism-related or not. Second, candidate microbial metabolites were extracted from the classified texts. Third, we developed signal prioritization algorithms to further differentiate microbial metabolites from metabolites originated from other resources. Finally, we systematically analyzed the interactions between extracted microbial metabolites and human genes. A total of 11,846 metabolites were extracted from 28 million MEDLINE articles. The combined text classification and signal prioritization significantly enriched true positives among top: manual curation of top 100 metabolites showed a true precision of 0.55, representing a significant 38.3-fold enrichment as compared to the precision of 0.014 for baseline extraction. More importantly, 29% extracted microbial metabolites have not been captured by existing databases. We performed data-driven analysis of the interactions between the extracted microbial metabolite and human genetics. This study represents the first effort towards automatically extracting and prioritizing microbial metabolites from published biomedical literature, which can set a foundation for future tasks of microbial metabolite relationship extraction from literature and facilitate data-driven studies of how microbial metabolism contributes to human diseases.

Highlights

  • Many diseases are driven by gene-environment interactions

  • We found that many microbial metabolites have been reported in biomedical literature, but not classified as microbial origin by Human Metabolome Database (HMDB), as shown in the sentence “We further investigate the bioactivity of the confirmed metabolites, and identify two microbiota-generated metabolites (5-hydroxy-L-tryptophan and salicylate) as activators of the aryl hydrocarbon receptor” (PMID 25411059)

  • We analyzed the interactions between identified microbial metabolites and human genes, which may provide mechanistic insights into how gut microbial metabolism may contribute to human health

Read more

Summary

Introduction

Many diseases are driven by gene-environment interactions. One important environmental factor is the metabolic output of human gut microbiota. We developed network-based systems approaches to examine genetic interactions between microbial metabolites and human diseases and revealed strong mechanistic links trimethylamine N-oxide (TMAO), a gut microbial metabolite of dietary meat and fat, and both colorectal cancer[13] and Alzheimer’s disease[15]. These computationally generated findings were subsequently verified by other researchers using patient sample-based metabolomics studies, which showed that plasma TMAO is positively associated with colorectal cancer risk[19] and that the gut microbiota-derived metabolite TMAO is elevated in Alzheimer’s disease[20]. Leveraging evidence from tens of millions of published biomedical records, we are taking an alternative approach to automatically classify, extract and prioritize microbial metabolites from free-text documents

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call