Molecular Substructures Research Articles

Over the past 10–15 years, the advent of high-throughput technologies developed by the pharmaceutical industry and related disciplines has produced large databases of drug efficacy, gene/protein expression levels, mutational status, as well as molecular structure information of potential drugs and drug targets which further our understanding of normal versus disease states and are useful for drug discovery. With the recent availability of large public databases, notably those from US government agencies such as the National Center for Biotechnology Information and the National Cancer Institute, the emerging field of chemogenomics is providing abundant opportunities for innovative new techniques for statistical analysis, data integration and applications of large-scale datamining. This is the first of two issues of Statistical Analysis and Data Mining to be published in 2009 focused on statistical methods for chemogenomics. The review by Liu and Verducci provides a good introduction to the field of chemogenomics as well as an overview of the many statistical, bio/chemo-informatic, and experimental subdisciplines used in modern drug discovery. Several papers in this issue describe new statistical methods for relating molecular structure to drug efficacy. Because compounds that have similar molecular structure often have similar biological properties, chemical similarity searching, on the basis of features extracted from the molecular structure, has long been used to prioritize compounds for testing. The article by Liu and Verducci introduces methods of chemical similarity searching that typically start with an active compound as a reference structure and aim to retrieve additional active compounds from a database. If instead of a single compound the search is based on multiple active compounds, the proportion of active compounds retrieved from the database can be significantly enriched. Turbo similarity searching (TSS) assumes that the nearest neighbors of the single reference compound provided by the user are active compounds and automatically uses them to conduct a multiple reference compound similarity search. Previous studies have shown that TSS can improve retrieval enrichment compared to a traditional similarity search. In this issue, Gardiner et al. compare the effectiveness of TSS over a variety of databases and use alternative structural descriptors for computing structural similarity. Scheiber et al. present a method for dealing with multiple activity values (i.e. from a series of assays) with the aim of explaining the assay differences from a chemical perspective. The key idea is to create meta-categories from the differences in assay values and use the new (ordinal) categories as dependent variables to identify chemical properties or molecular substructures associated with the difference in profiles. In the chemogenomics setting, this technique could be used to identify molecular features common to compounds for which activity levels are associated with gene/protein expression signatures. The effectiveness of a molecular descriptor set for identifying active compounds from a similarity search depends on the compound class. This dependency makes it difficult to select the most suitable descriptor set for a given search. Vogt et al. describe a method that combines a Bayesian scoring scheme for selection of structural and molecular property descriptors with an information-theoretic method for predicting recall rates. The procedure allows for the selection of search methods most likely to be successful for a given compound class and target database. Gardner-Lubbe et al. discuss the advantages of biplots for visualization and analysis of microarray data. Biplots are typically used in conjunction with principal component analysis. However, the authors show that PCA biplots do not provide optimal data separation when used for exploring the differences between treatments and differentially expressed genes. As an alternative, they suggest biplots based on analysis of distance and illustrate its effectiveness in separating gene expression samples from three treatment groups. As a whole, the papers in this issue take the reader from the fundamentals of how chemogenomic databases are constructed and searched, to a clever application of statistical analysis used to draw insight from these. The works reflect the best spirit of data mining for ideas, not just facts. The authors are thanked and congratulated.

BackgroundGraph theoretical methods are extensively used in the field of computational chemistry to search datasets of compounds to see if they contain particular molecular sub-structures or patterns. We describe a preliminary application of a graph theoretical method, developed in computational chemistry, to geographical epidemiology in relation to testing a prior hypothesis. We tested the methodology on the hypothesis that if a socioeconomically deprived neighbourhood is situated in a wider deprived area, then that neighbourhood would experience greater adverse effects on mortality compared with a similarly deprived neighbourhood which is situated in a wider area with generally less deprivation.MethodsWe used the Trent Region Health Authority area for this study, which contained 10,665 census enumeration districts (CED). Graphs are mathematical representations of objects and their relationships and within the context of this study, nodes represented CEDs and edges were determined by whether or not CEDs were neighbours (shared a common boundary). The overall area in this study was represented by one large graph comprising all CEDs in the region, along with their adjacency information. We used mortality data from 1988–1998, CED level population estimates and the Townsend Material Deprivation Index as an indicator of neighbourhood level deprivation. We defined deprived CEDs as those in the top 20% most deprived in the Region. We then set out to classify these deprived CEDs into seven groups defined by increasing deprivation levels in the neighbouring CEDs. 506 (24.2%) of the deprived CEDs had five adjacent CEDs and we limited pattern development and searching to these CEDs. We developed seven query patterns and used the RASCAL (Rapid Similarity Calculator) program to carry out the search for each of the query patterns. This program used a maximum common subgraph isomorphism method which was modified to handle geographical data.ResultsOf the 506 deprived CEDs, 10 were not identified as belonging to any of the seven groups because they were adjacent to a CED with a missing deprivation category quintile, and none fell within query Group 1 (a deprived CED for which all five adjacent CEDs were affluent). Only four CEDs fell within Group 2, which was defined as having four affluent adjacent CEDs and one non-affluent adjacent CED. The numbers of CEDs in Groups 3–7 were 17, 214, 95, 81 and 85 respectively. Age and sex adjusted mortality rate ratios showed a non-significant trend towards increasing mortality risk across Groups (Chi-square = 3.26, df = 1, p = 0.07).ConclusionGraph theoretical methods developed in computational chemistry may be a useful addition to the current GIS based methods available for geographical epidemiology but further developmental work is required. An important requirement will be the development of methods for specifying multiple complex search patterns. Further work is also required to examine the utility of using distance, as opposed to adjacency, to describe edges in graphs, and to examine methods for pattern specification when the nodes have multiple attributes attached to them.

Molecular Substructures Research Articles

Related Topics

Articles published on Molecular Substructures

Histidine 6.55 Is a Major Determinant of Ligand-Biased Signaling in Dopamine D2LReceptor

Discovering Interesting Molecular Substructures for Molecular Classification

Random molecular substructures as fragment-type descriptors

A topological substructural molecular design approach for predicting mutagenesis end-points of [formula omitted], [formula omitted]-unsaturated carbonyl compounds

Psoralen and Bergapten: In Silico Metabolism and Toxicophoric Analysis of Drugs Used to Treat Vitiligo

Mining Statistically Significant Molecular Substructures for Efficient Molecular Classification

Exploring Chemical Substructures Essential for hERG K+ Channel Blockade by Synthesis and Biological Evaluation of Dofetilide Analogues

The Evolutionary History of the Structure of 5S Ribosomal RNA

SPREAD—exploiting chemical features that cause differential activity behavior

Chemogenomic databases: Construction, search and analysis

A quantitative structure‐activity relationship for predicting metabolic biotransformation rates for organic chemicals in fish

Developing Statistical Diagnostic Tools For Discriminating Between Different Diffusive Modes Of Fluorescently Tagged Protein Complexes In Living Cells For Short Duration Trajectories

A graph-theory method for pattern identification in geographical epidemiology – a preliminary application to deprivation and mortality

Combining Cluster Analysis, Feature Selection and Multiple Support Vector Machine Models for the Identification of Human Ether‐a‐go‐go Related Gene Channel Blocking Compounds

Short-range order and collective dynamics of poly(vinyl acetate): A combined study by neutron scattering and molecular dynamics simulations

Topological Fragment Index for the Analysis of Molecular Substructures and Their Topological Environment in Active Compounds

Random Molecular Fragment Methods in Computational Medicinal Chemistry

Group Contribution Method for Thermodynamic Analysis of Complex Metabolic Networks

Characterization and Thermal Properties of Sol-Gel Processed PMMA/SiO<sub>2</sub> Hybrid Materials

The Origin and Evolution of tRNA Inferred from Phylogenetic Analysis of Structure

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Molecular Substructures Research Articles

Related Topics

Articles published on Molecular Substructures

Histidine 6.55 Is a Major Determinant of Ligand-Biased Signaling in Dopamine D2LReceptor

Discovering Interesting Molecular Substructures for Molecular Classification

Random molecular substructures as fragment-type descriptors

A topological substructural molecular design approach for predicting mutagenesis end-points of [formula omitted], [formula omitted]-unsaturated carbonyl compounds

Psoralen and Bergapten: In Silico Metabolism and Toxicophoric Analysis of Drugs Used to Treat Vitiligo

Mining Statistically Significant Molecular Substructures for Efficient Molecular Classification

Exploring Chemical Substructures Essential for hERG K+ Channel Blockade by Synthesis and Biological Evaluation of Dofetilide Analogues

The Evolutionary History of the Structure of 5S Ribosomal RNA

SPREAD—exploiting chemical features that cause differential activity behavior

Chemogenomic databases: Construction, search and analysis

A quantitative structure‐activity relationship for predicting metabolic biotransformation rates for organic chemicals in fish

Developing Statistical Diagnostic Tools For Discriminating Between Different Diffusive Modes Of Fluorescently Tagged Protein Complexes In Living Cells For Short Duration Trajectories

A graph-theory method for pattern identification in geographical epidemiology – a preliminary application to deprivation and mortality

Combining Cluster Analysis, Feature Selection and Multiple Support Vector Machine Models for the Identification of Human Ether‐a‐go‐go Related Gene Channel Blocking Compounds

Short-range order and collective dynamics of poly(vinyl acetate): A combined study by neutron scattering and molecular dynamics simulations

Topological Fragment Index for the Analysis of Molecular Substructures and Their Topological Environment in Active Compounds

Random Molecular Fragment Methods in Computational Medicinal Chemistry

Group Contribution Method for Thermodynamic Analysis of Complex Metabolic Networks

Characterization and Thermal Properties of Sol-Gel Processed PMMA/SiO&lt;sub&gt;2&lt;/sub&gt; Hybrid Materials

The Origin and Evolution of tRNA Inferred from Phylogenetic Analysis of Structure

Characterization and Thermal Properties of Sol-Gel Processed PMMA/SiO<sub>2</sub> Hybrid Materials