Feature combination networks for the interpretation of statistical machine learning models: application to Ames mutagenicity

Samuel J Webb,Paul Krause,Jonathan D Vessey,Brendan Howlin,Thierry Hanser

doi:10.1186/1758-2946-6-8

Samuel J Webb, Paul Krause + Show 3 more

Open Access

https://doi.org/10.1186/1758-2946-6-8

Copy DOI

Abstract

BackgroundA new algorithm has been developed to enable the interpretation of black box models. The developed algorithm is agnostic to learning algorithm and open to all structural based descriptors such as fragments, keys and hashed fingerprints. The algorithm has provided meaningful interpretation of Ames mutagenicity predictions from both random forest and support vector machine models built on a variety of structural fingerprints.A fragmentation algorithm is utilised to investigate the model’s behaviour on specific substructures present in the query. An output is formulated summarising causes of activation and deactivation. The algorithm is able to identify multiple causes of activation or deactivation in addition to identifying localised deactivations where the prediction for the query is active overall. No loss in performance is seen as there is no change in the prediction; the interpretation is produced directly on the model’s behaviour for the specific query.ResultsModels have been built using multiple learning algorithms including support vector machine and random forest. The models were built on public Ames mutagenicity data and a variety of fingerprint descriptors were used. These models produced a good performance in both internal and external validation with accuracies around 82%. The models were used to evaluate the interpretation algorithm. Interpretation was revealed that links closely with understood mechanisms for Ames mutagenicity.ConclusionThis methodology allows for a greater utilisation of the predictions made by black box models and can expedite further study based on the output for a (quantitative) structure activity model. Additionally the algorithm could be utilised for chemical dataset investigation and knowledge extraction/human SAR development.

Highlights

A new algorithm has been developed to enable the interpretation of black box models
If our goal is to make a model with the highest predictive performance possible we may choose a learning algorithm such as Random Forest (RF), Artificial Neural Network (ANN) or Support Vector Machine (SVM)
Here we discuss the performance of the learned models from cross validation and against external validation sets before discussing the interpretations produced against a selection of the validation data

Summary

Introduction

A new algorithm has been developed to enable the interpretation of black box models. The developed algorithm is agnostic to learning algorithm and open to all structural based descriptors such as fragments, keys and hashed fingerprints. The algorithm has provided meaningful interpretation of Ames mutagenicity predictions from both random forest and support vector machine models built on a variety of structural fingerprints. A fragmentation algorithm is utilised to investigate the model’s behaviour on specific substructures present in the query. The algorithm is able to identify multiple causes of activation or deactivation in addition to identifying localised deactivations where the prediction for the query is active overall. No loss in performance is seen as there is no change in the prediction; the interpretation is produced directly on the model’s behaviour for the specific query

Objectives

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of Cheminformatics	Publication Date: Mar 25, 2014
Citations: 58	License type: CC BY 2.0

R Discovery Prime

R Discovery Prime

Feature combination networks for the interpretation of statistical machine learning models: application to Ames mutagenicity

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Cheminformatics

Lead the way for us

Similar Papers

Modeling soil loss under rainfall events using machine learning algorithms
Yulan Chen ... Shijie Zhang
Journal of Environmental Management | VOL. 352
Yulan Chen, et. al.Yulan Chen ... Shijie Zhang
12 Jan 2024
Journal of Environmental Management | VOL. 352

Development and validation of a machine learning-based predictive model for assessing the 90-day prognostic outcome of patients with spontaneous intracerebral hemorrhage
Aimei Wu ... Ziye Zhao
Journal of translational medicine | VOL. 22
Aimei Wu, et. al.Aimei Wu ... Ziye Zhao
04 Mar 2024
Journal of translational medicine | VOL. 22

Identifying tuberculous pleural effusion using artificial intelligence machine learning algorithms
Zenghua Ren ... Ling Xu
Respiratory Research | VOL. 20
Zenghua Ren, et. al.Zenghua Ren ... Ling Xu
16 Oct 2019
Respiratory Research | VOL. 20

Prediction of Lumbar Drainage-Related Meningitis Based on Supervised Machine Learning Algorithms.
Peng Wang ... Shuwen Cheng
Frontiers in public health | VOL. 10
Peng Wang, et. al.Peng Wang ... Shuwen Cheng
28 Jun 2022
Frontiers in public health | VOL. 10

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Feature combination networks for the interpretation of statistical machine learning models: application to Ames mutagenicity

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Cheminformatics