Abstract

With the increasing amount of data made available in the chemical field, there is a strong need for systems capable of comparing and classifying chemical compounds in an efficient and effective way. The best approaches existing today are based on the structure-activity relationship premise, which states that biological activity of a molecule is strongly related to its structural or physicochemical properties. This work presents a novel approach to the automatic classification of chemical compounds by integrating semantic similarity with existing structural comparison methods. Our approach was assessed based on the Matthews Correlation Coefficient for the prediction, and achieved values of 0.810 when used as a prediction of blood-brain barrier permeability, 0.694 for P-glycoprotein substrate, and 0.673 for estrogen receptor binding activity. These results expose a significant improvement over the currently existing methods, whose best performances were 0.628, 0.591, and 0.647 respectively. It was demonstrated that the integration of semantic similarity is a feasible and effective way to improve existing chemical compound classification systems. Among other possible uses, this tool helps the study of the evolution of metabolic pathways, the study of the correlation of metabolic networks with properties of those networks, or the improvement of ontologies that represent chemical information.

Highlights

  • The recent publication of large-scale chemical information, made available by PubChem, ChEMBL and Chemical Entities of Biological Interest (ChEBI), for instance, increased the focus of the scientific community on the problem of chemical comparison

  • ChEBI is organized as an ontology that classifies chemical compounds, which we use to derive a semantic similarity measure that reflects the biological relevance of molecules

  • In an effort to use as much information as possible, we introduce Chemical hybrid metric (Chym), a system that integrates structural and semantic information in a single hybrid metric, and we show the accuracy of the system in three distinct classification problems, which consist in deciding whether a compound crosses the blood brain barrier, is a Pglycoprotein substrate or an estrogen receptor ligand

Read more

Summary

Introduction

The recent publication of large-scale chemical information, made available by PubChem, ChEMBL and ChEBI, for instance, increased the focus of the scientific community on the problem of chemical comparison. The creation of an effective and accurate system that can compare and classify chemical compounds is useful in a number of different applications It can help the understanding of the evolution of metabolic pathways, [1]; it can improve the information retrieval of disease, phenotype, and other models that contain references to chemical compounds; it enhances the study and development of pharmacophores [2,3]; and it can aid in toxicology, e.g. to estimate whether a given compound is or has the potential to be harmful to animals or humans without attempting a potentially harmful in vivo experiment [4]. Both clavulanic acid and 3carboxyphenyl phenylacetamidomethylphosphonate are b-lactamase inhibitors, despite their different structures (see Figure 1) To address this problem, we propose the use of the semantics of a chemical compound in the context of biological relevance, which we used to improve the existing methods, through the development of a novel hybrid metric that takes into account both structural and semantic information. Our proposal states that considering semantic similarity improves the performance of classification algorithms

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.