BioSM: Metabolomics Tool for Identifying Endogenous Mammalian Biochemical Structures in Chemical Structure Space

Mai A Hamdalla,Dennis W Hill,David F Grant,Ion I Mandoiu,Sanguthevar Rajasekaran

doi:10.1021/ci300512q

Abstract

The structural identification of unknown biochemical compounds in complex biofluids continues to be a major challenge in metabolomics research. Using LC/MS, there are currently two major options for solving this problem: searching small biochemical databases, which often do not contain the unknown of interest or searching large chemical databases which include large numbers of nonbiochemical compounds. Searching larger chemical databases (larger chemical space) increases the odds of identifying an unknown biochemical compound, but only if nonbiochemical structures can be eliminated from consideration. In this paper we present BioSM; a cheminformatics tool that uses known endogenous mammalian biochemical compounds (as scaffolds) and graph matching methods to identify endogenous mammalian biochemical structures in chemical structure space. The results of a comprehensive set of empirical experiments suggest that BioSM identifies endogenous mammalian biochemical structures with high accuracy. In a leave-one-out cross validation experiment, BioSM correctly predicted 95% of 1388 Kyoto Encyclopedia of Genes and Genomes (KEGG) compounds as endogenous mammalian biochemicals using 1565 scaffolds. Analysis of two additional biological data sets containing 2330 human metabolites (HMDB) and 2416 plant secondary metabolites (KEGG) resulted in biochemical annotations of 89% and 72% of the compounds, respectively. When a data set of 3895 drugs (DrugBank and USAN) was tested, 48% of these structures were predicted to be biochemical. However, when a set of synthetic chemical compounds (Chembridge and Chemsynthesis databases) were examined, only 29% of the 458,207 structures were predicted to be biochemical. Moreover, BioSM predicted that 34% of 883,199 randomly selected compounds from PubChem were biochemical. We then expanded the scaffold list to 3927 biochemical compounds and reevaluated the above data sets to determine whether scaffold number influenced model performance. Although there were significant improvements in model sensitivity and specificity using the larger scaffold list, the data set comparison results were very similar. These results suggest that additional biochemical scaffolds will not further improve our representation of biochemical structure space and that the model is reasonably robust. BioSM provides a qualitative (yes/no) and quantitative (ranking) method for endogenous mammalian biochemical annotation of chemical space and, thus, will be useful in the identification of unknown biochemical structures in metabolomics. BioSM is freely available at http://metabolomics.pharm.uconn.edu.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

BioSM: Metabolomics Tool for Identifying Endogenous Mammalian Biochemical Structures in Chemical Structure Space

Abstract

Talk to us

Similar Papers

More From: Journal of Chemical Information and Modeling

Lead the way for us

Journal: Journal of Chemical Information and Modeling	Publication Date: Feb 27, 2013
Citations: 30

Similar Papers

A molecular structure matching approach to efficient identification of endogenous mammalian biochemical structures.
Mai A Hamdalla ... Reda A Ammar
BMC Bioinformatics | VOL. Suppl 16 5
Mai A Hamdalla, et. al.Mai A Hamdalla ... Reda A Ammar
18 Mar 2015
BMC Bioinformatics | VOL. Suppl 16 5

Machine-Learning-Accelerated Perovskite Crystallization
Jeffrey Kirman ... Edward H Sargent
Matter | VOL. 2
Jeffrey Kirman, et. al.Jeffrey Kirman ... Edward H Sargent
10 Mar 2020
Matter | VOL. 2

In Silico Enzymatic Synthesis of a 400 000 Compound Biochemical Database for Nontargeted Metabolomics
Lochana C Menikarachchi ... Mai A Hamdalla
Journal of Chemical Information and Modeling | VOL. 53
Lochana C Menikarachchi, et. al.Lochana C Menikarachchi ... Mai A Hamdalla
12 Sep 2013
In Silico Enzymatic Synthesis of a 400 000 Compound Biochemical Database for Nontargeted Metabolomics
Lochana C Menikarachchi ... Mai A Hamdalla

Exploration of Ultralarge Compound Collections for Drug Discovery.
Wendy A Warr ... Matthias Rarey
Journal of Chemical Information and Modeling | VOL. 62
Wendy A Warr, et. al.Wendy A Warr ... Matthias Rarey
14 Apr 2022
Journal of Chemical Information and Modeling | VOL. 62

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

BioSM: Metabolomics Tool for Identifying Endogenous Mammalian Biochemical Structures in Chemical Structure Space

Abstract

Talk to us

Similar Papers

More From: Journal of Chemical Information and Modeling