A hierarchical approach to removal of unwanted variation for large-scale metabolomics data

Taiyun Kim,Owen Tang,John F O’Sullivan,Stuart M Grieve,Jean Yee Hwa Yang,Pengyi Yang,Yen Chin Koay,Stephen T Vernon,David E James,Gemma A Figtree,Terence P Speed,Katharine A Kott,John Park

doi:10.1038/s41467-021-25210-5

Abstract

Liquid chromatography-mass spectrometry-based metabolomics studies are increasingly applied to large population cohorts, which run for several weeks or even years in data acquisition. This inevitably introduces unwanted intra- and inter-batch variations over time that can overshadow true biological signals and thus hinder potential biological discoveries. To date, normalisation approaches have struggled to mitigate the variability introduced by technical factors whilst preserving biological variance, especially for protracted acquisitions. Here, we propose a study design framework with an arrangement for embedding biological sample replicates to quantify variance within and between batches and a workflow that uses these replicates to remove unwanted variation in a hierarchical manner (hRUV). We use this design to produce a dataset of more than 1000 human plasma samples run over an extended period of time. We demonstrate significant improvement of hRUV over existing methods in preserving biological signals whilst removing unwanted variation for large scale metabolomics studies. Our tools not only provide a strategy for large scale data normalisation, but also provides guidance on the design strategy for large omics studies.

Highlights

Liquid chromatography-mass spectrometry-based metabolomics studies are increasingly applied to large population cohorts, which run for several weeks or even years in data acquisition
We compare against the performance of a number of recently developed and commonly used methods in popular pipelines when applied to large cohort studies, such as Support Vector Regression (SVR)[5], Systematic Error Removal using Random Forest (SERRF)[15], and Removal of Unwanted Variation based approaches[22,23] (Table 1)
We developed a series of technical replications designed as a framework to enable effective data harmonisation in large cohorts studies over extended periods of time

Summary

Introduction

Liquid chromatography-mass spectrometry-based metabolomics studies are increasingly applied to large population cohorts, which run for several weeks or even years in data acquisition This inevitably introduces unwanted intra- and inter-batch variations over time that can overshadow true biological signals and hinder potential biological discoveries. An in-house targeted metabolomics study was performed on a hospital-based cohort of patients with atherosclerosis (BioHEART- CT) was conducted based on the proposed sample arrangement strategy, and we utilise this to assess the normalisation on a number of criteria including retention of biological signal, low variability among replication, and reproducibility of results in comparison to other existing methods. The hRUV method is accessible as an R package and as a shiny application at https://shiny.maths.usyd.edu.au/hRUV/

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Nature Communications	Publication Date: Aug 17, 2021
Citations: 27	License type: open-access

R Discovery Prime

R Discovery Prime

A hierarchical approach to removal of unwanted variation for large-scale metabolomics data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Nature Communications

Lead the way for us

Similar Papers

Normalization and integration of large-scale metabolomics data using support vector regression
Xiaotao Shen ... Zheng-Jiang Zhu
Metabolomics | VOL. 12
Xiaotao Shen, et. al.Xiaotao Shen ... Zheng-Jiang Zhu
26 Mar 2016
Metabolomics | VOL. 12

Systematic Error Removal Using Random Forest for Normalizing Large-Scale Untargeted Lipidomics Data
Sili Fan ... Tobias Kind
Analytical Chemistry | VOL. 91
Sili Fan, et. al.Sili Fan ... Tobias Kind
13 Feb 2019
Analytical Chemistry | VOL. 91

Utilizing metabolomics to distinguish asthma phenotypes: strategies and clinical implications
N Reisdorph ... M E Wechsler
Allergy | VOL. 68
N Reisdorph, et. al.N Reisdorph ... M E Wechsler
01 Aug 2013
Allergy | VOL. 68

WaveICA: A novel algorithm to remove batch effects for large-scale untargeted metabolomics data based on wavelet analysis
Kui Deng ... Zhenzi Li
Analytica Chimica Acta | VOL. 1061
Kui Deng, et. al.Kui Deng ... Zhenzi Li
19 Feb 2019
Analytica Chimica Acta | VOL. 1061

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A hierarchical approach to removal of unwanted variation for large-scale metabolomics data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Nature Communications