Abstract

The annotation of small molecules in untargeted mass spectrometry relies on the matching of fragment spectra to reference library spectra. While various spectrum-spectrum match scores exist, the field lacks statistical methods for estimating the false discovery rates (FDR) of these annotations. We present empirical Bayes and target-decoy based methods to estimate the false discovery rate (FDR) for 70 public metabolomics data sets. We show that the spectral matching settings need to be adjusted for each project. By adjusting the scoring parameters and thresholds, the number of annotations rose, on average, by +139% (ranging from −92 up to +5705%) when compared with a default parameter set available at GNPS. The FDR estimation methods presented will enable a user to assess the scoring criteria for large scale analysis of mass spectrometry based metabolomics data that has been essential in the advancement of proteomics, transcriptomics, and genomics science.

Highlights

  • The annotation of small molecules in untargeted mass spectrometry relies on the matching of fragment spectra to reference library spectra

  • To validate the false discovery rates (FDR) approach and how it performs for spectral annotation with real large scale untargeted mass spectrometry data, we performed FDR controlled spectral library matching with 70 data sets from GNPS, consisting of thousands of LC–mass spectrometric (MS) runs

  • The four methods that were implemented and assessed for significance of annotations for untargeted mass spectrometry are using empirical Bayes approach, which implies a probabilistic model of score distributions, and three different target-decoy approaches

Read more

Summary

Introduction

The annotation of small molecules in untargeted mass spectrometry relies on the matching of fragment spectra to reference library spectra. The FDR estimation methods presented will enable a user to assess the scoring criteria for large scale analysis of mass spectrometry based metabolomics data that has been essential in the advancement of proteomics, transcriptomics, and genomics science. To validate the FDR approach and how it performs for spectral annotation with real large scale untargeted mass spectrometry data, we performed FDR controlled spectral library matching with 70 data sets from GNPS, consisting of thousands of LC–MS runs. This revealed that there is no universal scoring criteria that can control the FDR in all data sets. Passatutto provides experimentalists with an high-throughput measure of confidence in MS/MS-based annotations by reporting an FDR, to guide the selection of scoring parameters for a project compatible with large scale MS/MS based untargeted metabolomics projects

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call