Abstract

With the growth of metabolomics research, more and more studies are conducted on large numbers of samples. Due to technical limitations of the Liquid Chromatography–Mass Spectrometry (LC/MS) platform, samples often need to be processed in multiple batches. Across different batches, we often observe differences in data characteristics. In this work, we specifically focus on data generated in multiple batches on the same LC/MS machinery. Traditional preprocessing methods treat all samples as a single group. Such practice can result in errors in the alignment of peaks, which cannot be corrected by post hoc application of batch effect correction methods. In this work, we developed a new approach that address the batch effect issue in the preprocessing stage, resulting in better peak detection, alignment and quantification. It can be combined with down-stream batch effect correction methods to further correct for between-batch intensity differences. The method is implemented in the existing workflow of the apLCMS platform. Analyzing data with multiple batches, both generated from standardized quality control (QC) plasma samples and from real biological studies, the new method resulted in feature tables with better consistency, as well as better down-stream analysis results. The method can be a useful addition to the tools available for large studies involving multiple batches. The method is available as part of the apLCMS package. Download link and instructions are at https://mypage.cuhk.edu.cn/academics/yutianwei/apLCMS/.

Highlights

  • With the growth of metabolomics research, more and more studies are conducted on large numbers of samples

  • Using a dataset from a quality control sample, a yeast cell line dataset, and a dataset generated from healthy human plasma samples, we show the method offers higher consistency in feature quantification for studies involving multiple batches, yielding better results in down-stream analyses

  • We focused on the baseline metabolomics data and its relation with baseline body mass index (BMI) in this analysis

Read more

Summary

Introduction

With the growth of metabolomics research, more and more studies are conducted on large numbers of samples. We developed a new approach that address the batch effect issue in the preprocessing stage, resulting in better peak detection, alignment and quantification. It can be combined with down-stream batch effect correction methods to further correct for between-batch intensity differences. Analyzing data with multiple batches, both generated from standardized quality control (QC) plasma samples and from real biological studies, the new method resulted in feature tables with better consistency, as well as better down-stream analysis results. Using traditional data preprocessing approaches, we either treat all the samples as a single batch, or preprocess different batch individually, and seek to merge the feature tables. Without between-batch RT correction and weak signal recovery across batches, low intensity features that are initially identified in a subset of batches cannot be accurately quantified in the remaining batches

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call