Abstract
MotivationUntargeted mass spectrometry (MS/MS) is a powerful method for detecting metabolites in biological samples. However, fast and accurate identification of the metabolites’ structures from MS/MS spectra is still a great challenge.ResultsWe present a new analysis method, called SubFragment-Matching (SF-Matching) that is based on the hypothesis that molecules with similar structural features will exhibit similar fragmentation patterns. We combine information on fragmentation patterns of molecules with shared substructures and then use random forest models to predict whether a given structure can yield a certain fragmentation pattern. These models can then be used to score candidate molecules for a given mass spectrum. For rapid identification, we pre-compute such scores for common biological molecular structure databases. Using benchmarking datasets, we find that our method has similar performance to CSI: FingerID and those very high accuracies can be achieved by combining our method with CSI: FingerID. Rarefaction analysis of the training dataset shows that the performance of our method will increase as more experimental data become available.Availability and implementationSF-Matching is available from http://www.bork.embl.de/Docu/sf_matching.Supplementary information Supplementary data are available at Bioinformatics online.
Highlights
Untargeted mass spectrometry (MS/MS) is a common approach for identification of metabolites in biological samples (Beger et al, 2016; O’Kell et al, 2017; Schrimpe-Rutledge et al, 2016)
Thereby, a complex biological sample is analyzed with liquid chromatography electrospray ionization tandem MS/MS, generating several thousands of MS/MS spectra in a few minutes
We search for molecular substructures first
Summary
Untargeted mass spectrometry (MS/MS) is a common approach for identification of metabolites in biological samples (Beger et al, 2016; O’Kell et al, 2017; Schrimpe-Rutledge et al, 2016). The currently fastest way of analyzing such data is to match fragmentation spectra of unknown substances to a reference spectral library (Kind et al, 2018). These spectral libraries are usually built from known purified metabolites or generated by researches experiments. Some databases like METLIN (Guijas et al, 2018), GNPS (Wang et al, 2016) and Massbank (Horai et al, 2010) are collecting these data. This experimental approach has the highest accuracy, generating these reference libraries is money- and time-consuming
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.