A Python-Based Pipeline for Preprocessing LC-MS Data for Untargeted Metabolomics Workflows.

Gabriel Riquelme,María Eugenia Monge,Pablo Marchi,Christina M Jones,Nicolás Zabalegui

doi:10.3390/metabo10100416

Abstract

Preprocessing data in a reproducible and robust way is one of the current challenges in untargeted metabolomics workflows. Data curation in liquid chromatography–mass spectrometry (LC–MS) involves the removal of biologically non-relevant features (retention time, m/z pairs) to retain only high-quality data for subsequent analysis and interpretation. The present work introduces TidyMS, a package for the Python programming language for preprocessing LC–MS data for quality control (QC) procedures in untargeted metabolomics workflows. It is a versatile strategy that can be customized or fit for purpose according to the specific metabolomics application. It allows performing quality control procedures to ensure accuracy and reliability in LC–MS measurements, and it allows preprocessing metabolomics data to obtain cleaned matrices for subsequent statistical analysis. The capabilities of the package are shown with pipelines for an LC–MS system suitability check, system conditioning, signal drift evaluation, and data curation. These applications were implemented to preprocess data corresponding to a new suite of candidate plasma reference materials developed by the National Institute of Standards and Technology (NIST; hypertriglyceridemic, diabetic, and African-American plasma pools) to be used in untargeted metabolomics studies in addition to NIST SRM 1950 Metabolites in Frozen Human Plasma. The package offers a rapid and reproducible workflow that can be used in an automated or semi-automated fashion, and it is an open and free tool available to all users.

Highlights

There has been an increasing awareness in the international metabolomics community about the need for implementing quality assurance (QA) and quality control (QC) processes to ensure data quality and reproducibility [1,2,3,4,5,6]
TidyMS was designed with the goal of preprocessing and curating data from any untargeted
Different descriptors are associated with the data matrix, including the experimental exact mass and retention time (Rt) values for features, and run order for samples

Summary

Introduction

There has been an increasing awareness in the international metabolomics community about the need for implementing quality assurance (QA) and quality control (QC) processes to ensure data quality and reproducibility [1,2,3,4,5,6]. Metabolites 2020, 10, 416 metabolomics community for preprocessing LC–MS data, such as MZmine, XCMS, MSDIAL, and workflow4metabolomics [8,12,13,14], among others These software packages perform feature detection and correspondence, and provide an extracted data matrix as output for subsequent analysis. Preprocessing LC–MS-based untargeted metabolomics data involves as well the removal of unwanted features (retention time, m/z pairs) to retain only those analytically robust enough for data analysis and interpretation [15,16]. To this end, several tools are available, such as SECIM-TOOLS [17]

Objectives

Methods

Results

Conclusion