Abstract

BackgroundA lack of transparency and reporting standards in the scientific community has led to increasing and widespread concerns relating to reproduction and integrity of results. As an omics science, which generates vast amounts of data and relies heavily on data science for deriving biological meaning, metabolomics is highly vulnerable to irreproducibility. The metabolomics community has made substantial efforts to align with FAIR data standards by promoting open data formats, data repositories, online spectral libraries, and metabolite databases. Open data analysis platforms also exist; however, they tend to be inflexible and rely on the user to adequately report their methods and results. To enable FAIR data science in metabolomics, methods and results need to be transparently disseminated in a manner that is rapid, reusable, and fully integrated with the published work. To ensure broad use within the community such a framework also needs to be inclusive and intuitive for both computational novices and experts alike.Aim of ReviewTo encourage metabolomics researchers from all backgrounds to take control of their own data science, mould it to their personal requirements, and enthusiastically share resources through open science.Key Scientific Concepts of ReviewThis tutorial introduces the concept of interactive web-based computational laboratory notebooks. The reader is guided through a set of experiential tutorials specifically targeted at metabolomics researchers, based around the Jupyter Notebook web application, GitHub data repository, and Binder cloud computing platform.

Highlights

  • Journal articles have been the primary medium for sharing new scientific research

  • We provide a brief overview of current data science frameworks relevant to the metabolomics community, corresponding barriers to achieving open science, and a practical solution in the form of the computational lab notebook, where code, prose and figures are combined into an interactive notebook that can be published online and accessed in a modern web browser through cloud computing

  • The remainder of this review provides readers with an experiential learning opportunity (Kolb 1984) using an example interactive metabolomics data analysis workflow deployed using a combination of Python, Jupyter Notebooks, and Binder

Read more

Summary

Introduction

Journal articles have been the primary medium for sharing new scientific research. To fully embrace the concept of ‘open data science’ the metabolomics community needs an open and accessible computational environment for rapid collaboration and experimentation The subject of this tutorial review is a practical open-science solution to this problem that balances ease-of-use and flexibility, targeted to novice metabolomic data scientists. This solution takes the form of ‘computational lab books’, such as Jupyter Notebooks (Kluyver et al 2016), that have a diverse range of overlapping potential applications in the post-genomic research community (Fig. 1). The overarching aim of this document is to encourage metabolomics researchers from all backgrounds, possibly with little or no computational expertise, to seize the opportunity to take control of their own data science, mould it to their personal requirements, and enthusiastically share resources through open science

Software tools and barriers to open science
Collaboration through cloud computing
Experiential learning tutorials
Jupyter Notebook
GitHub
Binder
Tutorial 1: launching and using a Jupyter Notebook on Binder
Tutorial 2: interacting with and editing a Jupyter
Tutorial 3: downloading and installing a Jupyter
Tutorial 4: creating a new Jupyter Notebook on a local computer
Tutorial 5: deploying a Jupyter Notebook on Binder via GitHub
Summary
Findings
Compliance with ethical standards
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call