Differences in learning characteristics between support vector machine and random forest models for compound classification revealed by Shapley value analysis

Friederike Maite Siemers,Jürgen Bajorath

doi:10.1038/s41598-023-33215-x

Friederike Maite Siemers, Jürgen Bajorath

Open Access

https://doi.org/10.1038/s41598-023-33215-x

Copy DOI

Journal: Scientific Reports	Publication Date: Apr 12, 2023
Citations: 11	License type: open-access

Affiliation: University of Bonn

Abstract

The random forest (RF) and support vector machine (SVM) methods are mainstays in molecular machine learning (ML) and compound property prediction. We have explored in detail how binary classification models derived using these algorithms arrive at their predictions. To these ends, approaches from explainable artificial intelligence (XAI) are applicable such as the Shapley value concept originating from game theory that we adapted and further extended for our analysis. In large-scale activity-based compound classification using models derived from training sets of increasing size, RF and SVM with the Tanimoto kernel produced very similar predictions that could hardly be distinguished. However, Shapley value analysis revealed that their learning characteristics systematically differed and that chemically intuitive explanations of accurate RF and SVM predictions had different origins.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Differences in learning characteristics between support vector machine and random forest models for compound classification revealed by Shapley value analysis

Abstract

Talk to us

Similar Papers

More From: Scientific Reports

Lead the way for us

Similar Papers

A comparison of random forest and support vector machine approaches to predict coal spontaneous combustion in gob
Changkui Lei ... Chimin Shu
Fuel | VOL. 239
Changkui Lei, et. al.Changkui Lei ... Chimin Shu
10 Nov 2018
Fuel | VOL. 239

مدل سازی پایداری خاکدانهها با استفاده از ماشینهای بردار پشتیبان و رگرسیون خطی چند متغیره
...
-
, et. al. ...
25 Apr 2015
25 Apr 2015

Comparative study of different machine learning models in landslide susceptibility assessment: A case study of Conghua District, Guangzhou, China
Ao Zhang ... Yi-Yong Li
China Geology | VOL. 7
Ao Zhang, et. al.Ao Zhang ... Yi-Yong Li
06 Feb 2024
China Geology | VOL. 7

Identification of the geographic origin of peaches by VIS-NIR spectroscopy, fluorescence spectroscopy and image processing technology
Qinyi Yang ... Huirong Xu
Journal of Food Composition and Analysis | VOL. 114
Qinyi Yang, et. al.Qinyi Yang ... Huirong Xu
23 Aug 2022
Journal of Food Composition and Analysis | VOL. 114

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Differences in learning characteristics between support vector machine and random forest models for compound classification revealed by Shapley value analysis

Abstract

Talk to us

Similar Papers

More From: Scientific Reports