Similarity maps - a visualization strategy for molecular fingerprints and machine-learning methods

Sereina Riniker,Gregory A Landrum

doi:10.1186/1758-2946-5-43

Abstract

Fingerprint similarity is a common method for comparing chemical structures. Similarity is an appealing approach because, with many fingerprint types, it provides intuitive results: a chemist looking at two molecules can understand why they have been determined to be similar. This transparency is partially lost with the fuzzier similarity methods that are often used for scaffold hopping and tends to vanish completely when molecular fingerprints are used as inputs to machine-learning (ML) models. Here we present similarity maps, a straightforward and general strategy to visualize the atomic contributions to the similarity between two molecules or the predicted probability of a ML model. We show the application of similarity maps to a set of dopamine D3 receptor ligands using atom-pair and circular fingerprints as well as two popular ML methods: random forests and naïve Bayes. An open-source implementation of the method is provided.

Highlights

Chemical structures are often represented by molecular fingerprints where structural features are converted to either bits in a bit vector or counts in a count vector
The “atomic weights” are generated by removing the bits belonging to the corresponding atom and comparing the resulting similarity with the similarity of the unmodified fingerprint
Similarity maps can be generated for every fingerprint that allows a backtracking of the bits to a corresponding atom or substructure

Summary

Introduction

Chemical structures are often represented by molecular fingerprints where structural features are converted to either bits in a bit vector or counts in a count vector. This abstract representation allows the computationally efficient handling and comparison of chemical structures. Depending on the descriptors used to generate the fingerprints, the interpretation of the resulting similarity may not be trivial This problem worsens when machine-learning (ML) models are trained to predict the activity (or other properties) of new compounds: ML models often appear as complete “black boxes” that just output numeric predictions to their users. Though these predictions can be quite accurate, it has been shown that supplementing numeric predictions with additional information from the model can improve the ability of both expert and non-expert users to work

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of Cheminformatics	Publication Date: Sep 24, 2013
Citations: 148	License type: CC BY 2.0

R Discovery Prime

R Discovery Prime

Similarity maps - a visualization strategy for molecular fingerprints and machine-learning methods

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Cheminformatics

Lead the way for us

Similar Papers

Sensors support machine learning
-
Food Science and Technology | VOL. 33
--
01 Dec 2019
Food Science and Technology | VOL. 33

P125. Development of a novel ensemble machine learning algorithm for prediction of complications and readmission after anterior cervical spinal fusion
Akash A Shah ... Nelson Soohoo
The Spine Journal | VOL. 21
Akash A Shah, et. al.Akash A Shah ... Nelson Soohoo
10 Aug 2021
The Spine Journal | VOL. 21

P126. Development of a novel ensemble machine learning algorithm for prediction of complications and readmission after posterior cervical spinal fusion
Akash A Shah ... Nelson Soohoo
The Spine Journal | VOL. 21
Akash A Shah, et. al.Akash A Shah ... Nelson Soohoo
10 Aug 2021
The Spine Journal | VOL. 21

Seismic fragility analysis of steel moment frames using machine learning models
Hoang D Nguyen ... Myoungsu Shin
Engineering Applications of Artificial Intelligence | VOL. 126
Hoang D Nguyen, et. al.Hoang D Nguyen ... Myoungsu Shin
15 Aug 2023
Engineering Applications of Artificial Intelligence | VOL. 126

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Similarity maps - a visualization strategy for molecular fingerprints and machine-learning methods

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Cheminformatics