Abstract

Metabolite identification for untargeted metabolomics is often hampered by the lack of experimentally collected reference spectra from tandem mass spectrometry (MS/MS). To circumvent this problem, Competitive Fragmentation Modeling-ID (CFM-ID) was developed to accurately predict electrospray ionization-MS/MS (ESI-MS/MS) spectra from chemical structures and to aid in compound identification via MS/MS spectral matching. While earlier versions of CFM-ID performed very well, CFM-ID’s performance for predicting the MS/MS spectra of certain classes of compounds, including many lipids, was quite poor. Furthermore, CFM-ID’s compound identification capabilities were limited because it did not use experimentally available MS/MS spectra nor did it exploit metadata in its spectral matching algorithm. Here, we describe significant improvements to CFM-ID’s performance and speed. These include (1) the implementation of a rule-based fragmentation approach for lipid MS/MS spectral prediction, which greatly improves the speed and accuracy of CFM-ID; (2) the inclusion of experimental MS/MS spectra and other metadata to enhance CFM-ID’s compound identification abilities; (3) the development of new scoring functions that improves CFM-ID’s accuracy by 21.1%; and (4) the implementation of a chemical classification algorithm that correctly classifies unknown chemicals (based on their MS/MS spectra) in >80% of the cases. This improved version called CFM-ID 3.0 is freely available as a web server. Its source code is also accessible online.

Highlights

  • Liquid chromatography (LC) coupled to mass spectrometry (MS) or tandem mass spectrometry (MS/MS) has become one of the leading techniques for compound identification in organic chemistry, natural product chemistry, and metabolomics [1,2]

  • We have shown that it is possible to substantially improve Competitive Fragmentation Modeling-ID (CFM-ID)’s performance in both spectral prediction and compound identification tasks

  • Integrating a rule-based fragmentation approach that currently applies 344 manually curated rules to predict the electrospray ionization (ESI)-MS/MS spectra for 21 classes of common, biologically important lipids, (2) modifying the structure of Competitive Fragmentation Modeling (CFM)-ID’s spectral database, and increasing its size by a factor of 2.6, (3) designing new scoring functions that take into account both compound citation frequency and chemical classification features of candidate molecules, and (4) implementing a chemical classification algorithm based on spectral similarity

Read more

Summary

Introduction

Liquid chromatography (LC) coupled to mass spectrometry (MS) or tandem mass spectrometry (MS/MS) has become one of the leading techniques for compound identification in organic chemistry, natural product chemistry, and metabolomics [1,2]. In order to identify individual compounds, the resulting MS/MS spectra, along with the chromatographic retention time and parent ion masses of the compound of interest, are (ideally) compared to the MS/MS spectra and retention time of authentic standards to confirm the compound’s identity. Because of the limited availability of many authentic chemical standards in most metabolomics labs, putative metabolite identification is more commonly performed [3]. Putative identification (MSI level 2) is achieved by comparing the MS/MS spectra to experimentally collected reference spectra found in various MS/MS spectral databases. Key to the success of this putative identification process is the availability of a large, comprehensive database containing experimentally collected MS/MS spectra of pure compounds that covers a large portion of “chemical space”. Publicly available databases of experimental MS/MS spectra currently cover a total of only ~20,000 unique compounds [4]

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call