Abstract

Data analysis for metabolomics is undergoing rapid progress thanks to the proliferation of novel tools and the standardization of existing workflows. As untargeted metabolomics datasets and experiments continue to increase in size and complexity, standardized workflows are often not sufficiently sophisticated. In addition, the ground truth for untargeted metabolomics experiments is intrinsically unknown and the performance of tools is difficult to evaluate. Here, the problem of dynamic multi-class metabolomics experiments was investigated using a simulated dataset with a known ground truth. This simulated dataset was used to evaluate the performance of tinderesting, a new and intuitive tool based on gathering expert knowledge to be used in machine learning. The results were compared to EDGE, a statistical method for time series data. This paper presents three novel outcomes. The first is a way to simulate dynamic metabolomics data with a known ground truth based on ordinary differential equations. This method is made available through the MetaboLouise R package. Second, the EDGE tool, originally developed for genomics data analysis, is highly performant in analyzing dynamic case vs. control metabolomics data. Third, the tinderesting method is introduced to analyse more complex dynamic metabolomics experiments. This tool consists of a Shiny app for collecting expert knowledge, which in turn is used to train a machine learning model to emulate the decision process of the expert. This approach does not replace traditional data analysis workflows for metabolomics, but can provide additional information, improved performance or easier interpretation of results. The advantage is that the tool is agnostic to the complexity of the experiment, and thus is easier to use in advanced setups. All code for the presented analysis, MetaboLouise and tinderesting are freely available.

Highlights

  • The field of metabolomics, which studies small molecules inside organisms, has expanded considerably over the last two decades and the amount of data being generated by metabolomics experiments keeps increasing

  • The result of the dynamic metabolomics data simulation is a time curve of the concentration for each metabolite (Figure 5). These continuous ground truth data are sampled at discrete time points, corresponding to actual experiments where the underlying biological process is continuous but samples are taken only at distinct time intervals because of practical limitations

  • We present the tinderesting tool to collect expert knowledge in an easy and quick manner via a Shiny app

Read more

Summary

Introduction

The field of metabolomics, which studies small molecules inside organisms, has expanded considerably over the last two decades and the amount of data being generated by metabolomics experiments keeps increasing. For many “typical” experiments, standardized workflows are available. Many initiatives have focused on providing free and open access workflows and pipelines for metabolomics data analysis. Workflow4Metabolomics [1] (W4M) provides an intuitive way of constructing workflows by linking modules together. These modules provide a multitude of steps for preprocessing, statistics, normalization and others. Many of these tools were originally written in R, python, etc. Many of these tools were originally written in R, python, etc. but have been converted to the Galaxy [2] environment that underlies

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call