Abstract

Motivation: Untargeted metabolomics comprehensively characterizes small molecules and elucidates activities of biochemical pathways within a biological sample. Despite computational advances, interpreting collected measurements and determining their biological role remains a challenge. Results: To interpret measurements, we present an inference-based approach, termed Probabilistic modeling for Untargeted Metabolomics Analysis (PUMA). Our approach captures metabolomics measurements and the biological network for the biological sample under study in a generative model and uses stochastic sampling to compute posterior probability distributions. PUMA predicts the likelihood of pathways being active, and then derives probabilistic annotations, which assign chemical identities to measurements. Unlike prior pathway analysis tools that analyze differentially active pathways, PUMA defines a pathway as active if the likelihood that the path generated the observed measurements is above a particular (user-defined) threshold. Due to the lack of “ground truth” metabolomics datasets, where all measurements are annotated and pathway activities are known, PUMA is validated on synthetic datasets that are designed to mimic cellular processes. PUMA, on average, outperforms pathway enrichment analysis by 8%. PUMA is applied to two case studies. PUMA suggests many biological meaningful pathways as active. Annotation results were in agreement to those obtained using other tools that utilize additional information in the form of spectral signatures. Importantly, PUMA annotates many measurements, suggesting 23 chemical identities for metabolites that were previously only identified as isomers, and a significant number of additional putative annotations over spectral database lookups. For an experimentally validated 50-compound dataset, annotations using PUMA yielded 0.833 precision and 0.676 recall.

Highlights

  • Analyzing cellular responses to perturbations such as drug treatments and genetic modifications promises to elucidate cellular metabolism, leading to improved outcomes in personalized medicine and synthetic biology

  • To give confidence in the performance of Probabilistic modeling for Untargeted Metabolomics Analysis (PUMA), it is desirable to validate the generative models against a “ground truth” dataset, where all measured metabolites are annotated and there is sufficient experimental evidence to allow attributing measured metabolites to specific pathways

  • As central metabolism and network topology is conserved across many organisms [35], we generated the synthetic datasets for a representative organism, the Chinese Hamster Ovary (CHO) cell, a popular biological sample that is discussed as a case study

Read more

Summary

Introduction

Analyzing cellular responses to perturbations such as drug treatments and genetic modifications promises to elucidate cellular metabolism, leading to improved outcomes in personalized medicine and synthetic biology. Metabolomics has emerged as the new ‘omics’, providing a readout of cellular activity that is most predictive of phenotype. Metabolomics, so far, has played a critical role in advancing applications spanning biomarker discovery [1], drug discovery and development [2], plant biology [3], nutrition [4], and environmental health [5]. The advent of untargeted metabolomics to measure molecular masses and spectral signatures of thousands of small molecule metabolites for a biological sample allows unprecedented opportunities to characterize the phenotype. The success of untargeted metabolomics in providing insight into cellular behavior, hinges on solving two problems. Metabolite annotation concerns associating measured masses with their

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call