Abstract

BackgroundPublic resources of chemical compound are in a rapid growth both in quantity and the types of data-representation. To comprehensively understand the relationship between the intrinsic features of chemical compounds and protein targets is an essential task to evaluate potential protein-binding function for virtual drug screening. In previous studies, correlations were proposed between bioactivity profiles and target networks, especially when chemical structures were similar. With the lack of effective quantitative methods to uncover such correlation, it is demanding and necessary for us to integrate the information from multiple data sources to produce an comprehensive assessment of the similarity between small molecules, as well as quantitatively uncover the relationship between compounds and their targets by such integrated schema.ResultsIn this study a multi-view based clustering algorithm was introduced to quantitatively integrate compound similarity from both bioactivity profiles and structural fingerprints. Firstly, a hierarchy clustering was performed with the fused similarity on 37 compounds curated from PubChem. Compared to clustering in a single view, the overall common target number within fused classes has been improved by using the integrated similarity, which indicated that the present multi-view based clustering is more efficient by successfully identifying clusters with its members sharing more number of common targets. Analysis in certain classes reveals that mutual complement of the two views for compound description helps to discover missing similar compound when only single view was applied. Then, a large-scale drug virtual screen was performed on 1267 compounds curated from Connectivity Map (CMap) dataset based on the fused similarity, which obtained a better ranking result compared to that of single-view. These comprehensive tests indicated that by combining different data representations; an improved assessment of target-specific compound similarity can be achieved.ConclusionsOur study presented an efficient, extendable and quantitative computational model for integration of different compound representations, and expected to provide new clues to improve the virtual drug screening from various pharmacological properties. Scripts, supplementary materials and data used in this study are publicly available at http://lifecenter.sgst.cn/fusion/.

Highlights

  • Public resources of chemical compound are in a rapid growth both in quantity and the types of data-representation

  • Public resources are in a rapid growth both in the quantity of data and in the type of data-generating, which provide us a great chance to further mine the relationship between compounds and their targets

  • Besides the classic representations of small molecules, like various fingerprints characterizing compound chemical structure, public highthroughput experimental data representing bioactivity of compounds are boosting with the development of online database, including PubChem [3], Gene Expression Omnibus (GEO, http://www.ncbi.nlm.nih.gov/geo/) [4] and DrugBank (DrugBank, http://drugbank.ca/) [5] etc., which provides an alternative way for molecule characterization based on bioactivity profiles

Read more

Summary

Introduction

Public resources of chemical compound are in a rapid growth both in quantity and the types of data-representation. To comprehensively understand the relationship between the intrinsic features of chemical compounds and protein targets is an essential task to evaluate potential protein-binding function for virtual drug screening. Several recent studies on the relationship between different compound features claimed that, correlations were proposed between bioactivity profiles and target networks, especially when chemical structures were similar [2,6,7,8]. By combining both public repositories of compound targets and compound bioactivity, these studies indicates that comparison of bioactivity profile can provide insight into the mode of actions (MOA) at the molecular level, which will facilitate the knowledge-based discovery of novel compounds. Two important and interesting computational issues are needed to investigate: (1) is there a quantitative relationship between compound features (bioactivity profile and structural feature) and compound target that can be described? (2) Since the former works implicated that an integration of multiple compound features may result in a better measurement of target-specific compound similarity rather than only one specific type was adopted, how such integration can be optimized to quantitatively and automatically combine information from various views of compound representations, i.e., structural features, bioactivity features and other more? Hereby in our study, we refer such multiple features description and integration for compound as a multi-view data representation and learning problem, and we aim at presenting a quantitative relationship between target-specific compound similarity and multi-view representations of compound features in an efficient multi-view learning schema

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call