Identification of metabolites from tandem mass spectra with a machine learning approach utilizing structural features.

Yuanyue Li,Peer Bork,Anne-Claude Gavin,Michael Kuhn

doi:10.1093/bioinformatics/btz736

Abstract

MotivationUntargeted mass spectrometry (MS/MS) is a powerful method for detecting metabolites in biological samples. However, fast and accurate identification of the metabolites’ structures from MS/MS spectra is still a great challenge.ResultsWe present a new analysis method, called SubFragment-Matching (SF-Matching) that is based on the hypothesis that molecules with similar structural features will exhibit similar fragmentation patterns. We combine information on fragmentation patterns of molecules with shared substructures and then use random forest models to predict whether a given structure can yield a certain fragmentation pattern. These models can then be used to score candidate molecules for a given mass spectrum. For rapid identification, we pre-compute such scores for common biological molecular structure databases. Using benchmarking datasets, we find that our method has similar performance to CSI: FingerID and those very high accuracies can be achieved by combining our method with CSI: FingerID. Rarefaction analysis of the training dataset shows that the performance of our method will increase as more experimental data become available.Availability and implementationSF-Matching is available from http://www.bork.embl.de/Docu/sf_matching.Supplementary information Supplementary data are available at Bioinformatics online.

Highlights

Untargeted mass spectrometry (MS/MS) is a common approach for identification of metabolites in biological samples (Beger et al, 2016; O’Kell et al, 2017; Schrimpe-Rutledge et al, 2016)
Thereby, a complex biological sample is analyzed with liquid chromatography electrospray ionization tandem MS/MS, generating several thousands of MS/MS spectra in a few minutes
We search for molecular substructures first

Summary

Introduction

Untargeted mass spectrometry (MS/MS) is a common approach for identification of metabolites in biological samples (Beger et al, 2016; O’Kell et al, 2017; Schrimpe-Rutledge et al, 2016). The currently fastest way of analyzing such data is to match fragmentation spectra of unknown substances to a reference spectral library (Kind et al, 2018). These spectral libraries are usually built from known purified metabolites or generated by researches experiments. Some databases like METLIN (Guijas et al, 2018), GNPS (Wang et al, 2016) and Massbank (Horai et al, 2010) are collecting these data. This experimental approach has the highest accuracy, generating these reference libraries is money- and time-consuming

Objectives

Methods

Results

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Bioinformatics	Publication Date: Oct 12, 2019
Citations: 32	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Identification of metabolites from tandem mass spectra with a machine learning approach utilizing structural features.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Bioinformatics

Lead the way for us

Similar Papers

Rapid and sensitive liquid chromatography–tandem mass spectrometry for the quantitation of epirubicin and identification of metabolites in biological samples
Rachel Wall ... Robert O’Connor
Talanta | VOL. 72
Rachel Wall, et. al.Rachel Wall ... Robert O’Connor
20 Nov 2006
Talanta | VOL. 72

Corydalis Rhizoma as a model for herb-derived trace metabolites exploration: A cross-mapping strategy involving multiple doses and samples
Chanjuan Yu ... Xiaoyan Gao
Journal of Pharmaceutical Analysis | VOL. 11
Chanjuan Yu, et. al.Chanjuan Yu ... Xiaoyan Gao
10 Mar 2020
Journal of Pharmaceutical Analysis | VOL. 11

Building and Searching Tandem Mass Spectral Libraries for Peptide Identification
Henry Lam
Molecular & Cellular Proteomics | VOL. 10
Henry LamHenry Lam
06 Sep 2011
Molecular & Cellular Proteomics | VOL. 10

Drug Metabolite Profiling and Identification by High-resolution Mass Spectrometry
Mingshe Zhu ... W Griffith Humphreys
Journal of Biological Chemistry | VOL. 286
Mingshe Zhu, et. al.Mingshe Zhu ... W Griffith Humphreys
01 Jul 2011
Journal of Biological Chemistry | VOL. 286

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Identification of metabolites from tandem mass spectra with a machine learning approach utilizing structural features.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Bioinformatics