Abstract

Trillions of bacteria in human body (human microbiota) affect human health and diseases by controlling host functions through small molecule metabolites.An accurate and comprehensive catalog of the metabolic output from human microbiota is critical for our deep understanding of how microbial metabolism contributes to human health.The large number of published biomedical research articles is a rich resource of microbiome studies.However, automatically extracting microbial metabolites from free-text documents and differentiating them from other human metabolites is a challenging task.Here we developed an integrated approach called Co-occurrence Metabolite Network Ranking (CoMNRank) by combining named entity extraction, network construction and topic sensitive network-based prioritization to extract and prioritize microbial metabolites from biomedical articles. The text data included 28,851,232 MEDLINE records.CoMNRank consists of three steps: (1) extraction of human metabolites from MEDLINE records; (2) construction of a weighted co-occurrence metabolite network (CoMN); (3) prioritization and differentiation of microbial metabolites from other human metabolites. For the first step of CoMNRank, we extracted 11,846 human metabolites from MEDLINE articles, with a baseline performance of precision of 0.014, recall of 0.959 and F1 of 0.028.We then constructed a weighted CoMN of 6,996 nodes and 986,186 edges.CoMNRank effectively prioritized microbial metabolites: the precision of top ranked metabolites is 0.45, a 31-fold enrichment as compared to the overall precision of 0.014.Manual curation of top 100 metabolites showed a true precision of 0.67, among which 48% true positives are not captured by existing databases. Our study sets the foundation for future tasks of microbial entity and relationship extractions as well as data-driven studies of how microbial metabolism contributes to human health and diseases.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call