Abstract

Non-targeted mass spectrometry (MS) has become an important method over recent years in the fields of metabolomics and environmental research. While more and more algorithms and workflows become available to process a large number of non-targeted data sets, there still exist few manually evaluated universal test data sets for refining and evaluating these methods. The first step of non-targeted screening, peak detection and refinement of it is arguably the most important step for non-targeted screening. However, the absence of a model data set makes it harder for researchers to evaluate peak detection methods. In this Data Descriptor, we provide a manually checked data set consisting of 255,000 EICs (5000 peaks randomly sampled from across 51 samples) for the evaluation on peak detection and gap-filling algorithms. The data set was created from a previous real-world study, of which a subset was used to extract and manually classify ion chromatograms by three mass spectrometry experts. The data set consists of the converted mass spectrometry files, intermediate processing files and the central file containing a table with all important information for the classified peaks.

Highlights

  • Rather than just monitoring a predetermined set of substances, which elute at specific retention times with a specific mass-to-charge ratio (m/z), non-targeted LC-mass spectrometry (MS)

  • The collection of ion chromatograms that reveal the substances without a priori knowledge of their identity in these samples is the first and an especially important step to gather as complete information as possible from the mass spectral raw data

  • The handling of thousands of peaks requires the availability of reliable and optimized algorithms for the sample processing. Optimization of these algorithms is based on an experimental design (DOE)

Read more

Summary

Introduction

While more and more algorithms and workflows become available to process a large number of non-targeted data sets, there still exist few manually evaluated universal test data sets for refining and evaluating these methods. The absence of a model data set makes it harder for researchers to evaluate peak detection methods. In this Data Descriptor, we provide a manually checked data set consisting of 255,000 EICs (5000 peaks randomly sampled from across 51 samples) for the evaluation on peak detection and gap-filling algorithms. The data set consists of the converted mass spectrometry files, intermediate processing files and the central file containing a table with all important information for the classified peaks

Objectives
Methods
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.