Abstract

As a powerful phenotyping technology, metabolomics provides new opportunities in biomarker discovery through metabolome-wide association studies (MWAS) and the identification of metabolites having a regulatory effect in various biological processes. While mass spectrometry-based (MS) metabolomics assays are endowed with high throughput and sensitivity, MWAS are doomed to long-term data acquisition generating an overtime-analytical signal drift that can hinder the uncovering of real biologically relevant changes. We developed “dbnorm”, a package in the R environment, which allows for an easy comparison of the model performance of advanced statistical tools commonly used in metabolomics to remove batch effects from large metabolomics datasets. “dbnorm” integrates advanced statistical tools to inspect the dataset structure not only at the macroscopic (sample batches) scale, but also at the microscopic (metabolic features) level. To compare the model performance on data correction, “dbnorm” assigns a score that help users identify the best fitting model for each dataset. In this study, we applied “dbnorm” to two large-scale metabolomics datasets as a proof of concept. We demonstrate that “dbnorm” allows for the accurate selection of the most appropriate statistical tool to efficiently remove the overtime signal drift and to focus on the relevant biological components of complex datasets.

Highlights

  • As a powerful phenotyping technology, metabolomics provides new opportunities in biomarker discovery through metabolome-wide association studies (MWAS) and the identification of metabolites having a regulatory effect in various biological processes

  • Monitoring of batch effect in the LC–mass spectrometry-based (MS) targeted metabolomics analysis of the SKIPOGH human cross-sectional study with “dbnorm”. 1079 plasma samples were analyzed in analytical batches over a period of months. 239 metabolites were detected. (A) principal component analysis (PCA) of raw data shows separation of sample clusters mainly triggered by batch order

  • The first dataset on which “dbnorm” was tested is a targeted measurement of the plasma metabolome of 1,079 individuals in the human SKIPOGH cohort. These samples were analyzed in analytical batches over a period of months, together with 135 quality control (QC) samples that were injected every 10 samples

Read more

Summary

Introduction

As a powerful phenotyping technology, metabolomics provides new opportunities in biomarker discovery through metabolome-wide association studies (MWAS) and the identification of metabolites having a regulatory effect in various biological processes. LC–MS-based metabolomics assays suffer from inherent variations in the distribution of signal measurements and/or in its signal sensitivity and intensity driven by external ­factors[14] This signal drift is a major limitation of current data normalization in biomedical and clinical studies, resulting in unavoidable technical variations introduced during sample preparation and a­ nalysis[15]. Biological samples needs to be completed in different analytical blocks (i.e., batches) over several weeks or even ­months[15,16] In this case, the largest variance in the dataset may be assigned to the batch effects or experimental run order, hindering identification of the real biological differences and true functional signals, leading to data m­ isinterpretation[15,17,18]. Large metabolomics datasets need to be corrected for unwanted and within and between-batch, analytical variation to make the data comparable and to reveal biologically relevant ­changes[19,20,21]

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call