Abstract
While the entirety of ‘Chemical Space’ is huge (and assumed to contain between 1063 and 10200 ‘small molecules’), distinct subsets of this space can nonetheless be defined according to certain structural parameters. An example of such a subspace is the chemical space spanned by endogenous metabolites, defined as ‘naturally occurring’ products of an organisms' metabolism. In order to understand this part of chemical space in more detail, we analyzed the chemical space populated by human metabolites in two ways. Firstly, in order to understand metabolite space better, we performed Principal Component Analysis (PCA), hierarchical clustering and scaffold analysis of metabolites and non-metabolites in order to analyze which chemical features are characteristic for both classes of compounds. Here we found that heteroatom (both oxygen and nitrogen) content, as well as the presence of particular ring systems was able to distinguish both groups of compounds. Secondly, we established which molecular descriptors and classifiers are capable of distinguishing metabolites from non-metabolites, by assigning a ‘metabolite-likeness’ score. It was found that the combination of MDL Public Keys and Random Forest exhibited best overall classification performance with an AUC value of 99.13%, a specificity of 99.84% and a selectivity of 88.79%. This performance is slightly better than previous classifiers; and interestingly we found that drugs occupy two distinct areas of metabolite-likeness, the one being more ‘synthetic’ and the other being more ‘metabolite-like’. Also, on a truly prospective dataset of 457 compounds, 95.84% correct classification was achieved. Overall, we are confident that we contributed to the tasks of classifying metabolites, as well as to understanding metabolite chemical space better. This knowledge can now be used in the development of new drugs that need to resemble metabolites, and in our work particularly for assessing the metabolite-likeness of candidate molecules during metabolite identification in the metabolomics field.
Highlights
The area of ‘Metabolomics’ is relatively young [1,2] and describes the large-scale analysis of metabolites
nuclear magnetic resonance (NMR) allows for a detailed characterization of the chemical structure of theknown compound, and it is the preferred technique for unambiguous identification of a chemical structure
Molecules from the two datasets were standardized with PipelinePilot Student Edition 6.1 [36] using the ‘washing’ workflow suggested by Dobson et al [30], which involved the selection of the largest fragment in the structure, the removal of salts and hydrogen atoms and the standardization of charges and stereochemistry
Summary
The area of ‘Metabolomics’ is relatively young [1,2] and describes the large-scale analysis of (often human and endogenous) metabolites. It comprises both the analytical approaches employed, such as mass spectroscopy (MS) as well as the analysis of the resulting data on a network- and phenotype level. In practice it is found that some metabolites with different lipophilicity can only be detected by one of the experimental techniques but not by others [4,5,6,7,8,9]. MS offers high sensitivity and specificity, requiring less amounts of sample, but providing less information about the chemical structure, namely its elemental composition and some structural fragments
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.