Large Chemical Space Research Articles

The structural identification of unknown biochemical compounds in complex biofluids continues to be a major challenge in metabolomics research. Using LC/MS, there are currently two major options for solving this problem: searching small biochemical databases, which often do not contain the unknown of interest or searching large chemical databases which include large numbers of nonbiochemical compounds. Searching larger chemical databases (larger chemical space) increases the odds of identifying an unknown biochemical compound, but only if nonbiochemical structures can be eliminated from consideration. In this paper we present BioSM; a cheminformatics tool that uses known endogenous mammalian biochemical compounds (as scaffolds) and graph matching methods to identify endogenous mammalian biochemical structures in chemical structure space. The results of a comprehensive set of empirical experiments suggest that BioSM identifies endogenous mammalian biochemical structures with high accuracy. In a leave-one-out cross validation experiment, BioSM correctly predicted 95% of 1388 Kyoto Encyclopedia of Genes and Genomes (KEGG) compounds as endogenous mammalian biochemicals using 1565 scaffolds. Analysis of two additional biological data sets containing 2330 human metabolites (HMDB) and 2416 plant secondary metabolites (KEGG) resulted in biochemical annotations of 89% and 72% of the compounds, respectively. When a data set of 3895 drugs (DrugBank and USAN) was tested, 48% of these structures were predicted to be biochemical. However, when a set of synthetic chemical compounds (Chembridge and Chemsynthesis databases) were examined, only 29% of the 458,207 structures were predicted to be biochemical. Moreover, BioSM predicted that 34% of 883,199 randomly selected compounds from PubChem were biochemical. We then expanded the scaffold list to 3927 biochemical compounds and reevaluated the above data sets to determine whether scaffold number influenced model performance. Although there were significant improvements in model sensitivity and specificity using the larger scaffold list, the data set comparison results were very similar. These results suggest that additional biochemical scaffolds will not further improve our representation of biochemical structure space and that the model is reasonably robust. BioSM provides a qualitative (yes/no) and quantitative (ranking) method for endogenous mammalian biochemical annotation of chemical space and, thus, will be useful in the identification of unknown biochemical structures in metabolomics. BioSM is freely available at http://metabolomics.pharm.uconn.edu.

Read full abstract

Amphiphile self-assembly materials, which contain both a hydrophilic and a hydrophobic domain, have great potential in high-throughput and combinatorial approaches to discovery and development. However, the materials chemistry community has not embraced these ideas to anywhere near the extent that the medicinal chemistry community has. While this situation is beginning to change, extracting the full potential of high-throughput approaches in the development of self-assembling materials will require further development in the synthesis, characterization, formulation, and application domains. One of the key factors that make small molecule amphiphiles prospective building blocks for next generation multifunctional materials is their ability to self-assemble into complex nanostructures through low-energy transformations. Scientists can potentially tune, control, and functionalize these structures, but only after establishing their inherent properties. Because both robotic materials handling and customized rapid characterization equipment are increasingly available, high-throughput solutions are now attainable. These address traditional development bottlenecks associated with self-assembling amphiphile materials, such as their structural characterization and the assessment of end-use functional performance. A high-throughput methodology can help streamline materials development workflows, in accord with existing high-throughput discovery pipelines such as those used by the pharmaceutical industry in drug discovery. Chemists have identified several areas that are amenable to a high-throughput approach for amphiphile self-assembly materials development. These allow an exploration of not only a large potential chemical, compositional, and structural space, but also material properties, formulation, and application variables. These areas of development include materials synthesis and preparation, formulation, characterization, and screening performance for the desired end application. High-throughput data analysis is crucial at all stages to keep pace with data collection. In this Account, we describe high-throughput advances in the field of amphiphile self-assembly, focusing on nanostructured lyotropic liquid crystalline materials, which form when amphiphiles are added to a polar solvent. We outline recent progress in the automated preparation of amphiphile molecules and their nanostructured self-assembly systems both in the bulk phase and in dispersed colloidal particulate systems. Once prepared, we can structurally characterize these systems by establishing phase behavior in a high-throughput manner with both laboratory (infrared and light polarization microscopy) and synchrotron facilities (small-angle X-ray scattering). Additionally, we provide three case studies to demonstrate how chemists can use high-throughput approaches to evaluate the functional performance of amphiphile self-assembly materials. The high-throughput methodology for the set-up and characterization of large matrix in meso membrane protein crystallization trials can illustrate an application of bulk phase self-assembling amphiphiles. For dispersed colloidal systems, two nanomedicine examples highlight advances in high-throughput preparation, characterization, and evaluation: drug delivery and magnetic resonance imaging agents.

Read full abstract

Large Chemical Space Research Articles

Related Topics

Articles published on Large Chemical Space

BioSM: Metabolomics Tool for Identifying Endogenous Mammalian Biochemical Structures in Chemical Structure Space

High-Throughput Development of Amphiphile Self-Assembly Materials: Fast-Tracking Synthesis, Characterization, Formulation, Application, and Understanding

Discovery of Potent, Selective Multidrug and Toxin Extrusion Transporter 1 (MATE1, SLC47A1) Inhibitors Through Prescription Drug Profiling and Computational Modeling

Higher-order multicomponent reactions: beyond four reactants

Various cyclization scaffolds by a truly Ugi 4-CR

Advances in chromatography: Efficient profiling of crude extracts and isolation of natural products

HiTSEE KNIME: a visualization tool for hit selection and analysis in high-throughput screening experiments for the KNIME platform

Chemical and Biological Properties of Frequent Screening Hits

In silico fragment-based drug design

GARLig: A Fully Automated Tool for Subset Selection of Large Fragment Spaces via a Self-Adaptive Genetic Algorithm

In‐silico identification of high potential SSH‐2 specific inhibitors

Molecular Design of Porphyrin-Based Nonlinear Optical Materials

Mutagenic probability estimation of chemical compounds by a novel molecular electrophilicity vector and support vector machine

Systematic investigation of protein phase behavior with a microfluidic formulator.

Bioaccumulation potential of persistent organic chemicals in humans.

Neural networks as data mining tools in drug design

Recent Advances in Isocyanide‐Based Multicomponent Chemistry

Solid- and solution-Phase synthesis of highly-Substituted-Pyrrolidine libraries

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Large Chemical Space Research Articles

Related Topics

Articles published on Large Chemical Space

BioSM: Metabolomics Tool for Identifying Endogenous Mammalian Biochemical Structures in Chemical Structure Space

High-Throughput Development of Amphiphile Self-Assembly Materials: Fast-Tracking Synthesis, Characterization, Formulation, Application, and Understanding

Discovery of Potent, Selective Multidrug and Toxin Extrusion Transporter 1 (MATE1, SLC47A1) Inhibitors Through Prescription Drug Profiling and Computational Modeling

Higher-order multicomponent reactions: beyond four reactants

Various cyclization scaffolds by a truly Ugi 4-CR

Advances in chromatography: Efficient profiling of crude extracts and isolation of natural products

HiTSEE KNIME: a visualization tool for hit selection and analysis in high-throughput screening experiments for the KNIME platform

Chemical and Biological Properties of Frequent Screening Hits

In silico fragment-based drug design

GARLig: A Fully Automated Tool for Subset Selection of Large Fragment Spaces via a Self-Adaptive Genetic Algorithm

In‐silico identification of high potential SSH‐2 specific inhibitors

Molecular Design of Porphyrin-Based Nonlinear Optical Materials

Mutagenic probability estimation of chemical compounds by a novel molecular electrophilicity vector and support vector machine

Systematic investigation of protein phase behavior with a microfluidic formulator.

Bioaccumulation potential of persistent organic chemicals in humans.

Neural networks as data mining tools in drug design

Recent Advances in Isocyanide‐Based Multicomponent Chemistry

Solid- and solution-Phase synthesis of highly-Substituted-Pyrrolidine libraries