Abstract

MotivationIdentification of small molecules in a biological sample remains a major bottleneck in molecular biology, despite a decade of rapid development of computational approaches for predicting molecular structures using mass spectrometry (MS) data. Recently, there has been increasing interest in utilizing other information sources, such as liquid chromatography (LC) retention time (RT), to improve identifications solely based on MS information, such as precursor mass-per-charge and tandem mass spectrometry (MS2).ResultsWe put forward a probabilistic modelling framework to integrate MS and RT data of multiple features in an LC-MS experiment. We model the MS measurements and all pairwise retention order information as a Markov random field and use efficient approximate inference for scoring and ranking potential molecular structures. Our experiments show improved identification accuracy by combining MS2 data and retention orders using our approach, thereby outperforming state-of-the-art methods. Furthermore, we demonstrate the benefit of our model when only a subset of LC-MS features has MS2 measurements available besides MS1.Availability and implementationSoftware and data are freely available at https://github.com/aalto-ics-kepaco/msms_rt_score_integration.Supplementary information Supplementary data are available at Bioinformatics online.

Highlights

  • The identification of small molecules, such as metabolites or drugs, in biological samples is a challenging task posing a bottleneck in various research fields, such as biomedicine, biotechnology, environmental chemistry and drug discovery

  • There has been increasing interest in utilizing other information sources, such as liquid chromatography (LC) retention time (RT), to improve identifications solely based on MS information, such as precursor mass-per-charge and tandem mass spectrometry (MS2)

  • We demonstrate the benefit of our model when only a subset of LC-MS features has MS2 measurements available besides MS1

Read more

Summary

Introduction

The identification of small molecules, such as metabolites or drugs, in biological samples is a challenging task posing a bottleneck in various research fields, such as biomedicine, biotechnology, environmental chemistry and drug discovery. The samples typically contain thousands of different molecules, the vast majority of which remain unidentified (Aksenov et al, 2017; da Silva et al, 2015). LC separates molecules by their differential physicochemical interaction between the stationary and mobile phase, which results in retention time (RT) differences and MS separates molecular ions by their mass per charge (MS1). In an untargeted metabolomics experiment, large sets of MS features (MS1 and RT, plus optionally MS2), are observed, corresponding to the different molecules in the sample. Metabolite identification concerns the structural annotation of the observed MS features

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.