Abstract

The measurement of molecular similarity is an essential part of various machine learning tasks in chemical informatics. Graph kernels provide good similarity measures between molecules. Conventional graph kernels are based on counting common subgraphs of specific types in the molecular graphs. This approach has two primary limitations: (i) only exact subgraph matching is considered in the counting operation, and (ii) most of the subgraphs will be less relevant to a given task. In order to address the above-mentioned limitations, we propose a new graph kernel as an extension of the subtree kernel initially proposed by Ramon and Gärtner (2003). The proposed kernel tolerates an inexact match between subgraphs by allowing matching between atoms with similar local environments. In addition, the proposed kernel provides a method to assign an importance weight to each subgraph according to the relevance to the task, which is predetermined by a statistical test. These extensions are evaluated for classification and regression tasks of predicting a wide range of pharmaceutical properties from molecular structures, with promising results.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call