Chemmodlab: a cheminformatics modeling laboratory\xa0R package for fitting and assessing machine learning models

Jeremy R Ash,Jacqueline M Hughes-Oliver

doi:10.1186/s13321-018-0309-4

Jeremy R Ash, Jacqueline M Hughes-Oliver

Open Access

https://doi.org/10.1186/s13321-018-0309-4

Copy DOI

Journal: Journal of Cheminformatics	Publication Date: Nov 28, 2018
Citations: 3	License type: open-access

Affiliation: North Carolina State University

Abstract

The goal of chemmodlab is to streamline the fitting and assessment pipeline for many machine learning models in R, making it easy for researchers to compare the utility of these models. While focused on implementing methods for model fitting and assessment that have been accepted by experts in the cheminformatics field, all of the methods in chemmodlab have broad utility for the machine learning community. chemmodlab contains several assessment utilities, including a plotting function that constructs accumulation curves and a function that computes many performance measures. The most novel feature of chemmodlab is the ease with which statistically significant performance differences for many machine learning models is presented by means of the multiple comparisons similarity plot. Differences are assessed using repeated k-fold cross validation, where blocking increases precision and multiplicity adjustments are applied. chemmodlab is freely available on CRAN at https://cran.r-project.org/web/packages/chemmodlab/index.html.

Highlights

It is commonplace for researchers across a variety of fields to fit machine learning models on complex data to make predictions
There are a myriad of modeling methods implemented in R that may be worthwhile for researchers to try
Functions for computing molecular descriptors and applicability domain have been added; chemmodlab is organized into two successive components: (1) model fitting, which is primarily conducted via function ModelTrain, and (2) model assessment, which is conducted via function CombineSplits

Summary

Introduction

It is commonplace for researchers across a variety of fields to fit machine learning models on complex data to make predictions. The Pharmacophore-Least Angle Regression (LAR) combination (AUC: .71) involves a highly interpretable linear model with a subset of the Pharmacophore descriptors selected This .05 difference is small and without additional investigations it is unclear whether it is statistically significant. By performing multiple cross validation splits and using these splits as a blocking factor to improve precision, chemmodlab is able to test for statistical significance of performance measure differences and visualize these results in a manner that can be interpreted by the user The question this addresses is: if the experiment were repeated with changes to the training and/or test set, would the best performing model still be the best? Functions for computing molecular descriptors and applicability domain have been added; chemmodlab is organized into two successive components: (1) model fitting, which is primarily conducted via function ModelTrain, and (2) model assessment, which is conducted via function CombineSplits

Results and discussion

CID Outcome

Number of compounds selected

Error Rate

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Chemmodlab: a cheminformatics modeling laboratory\xa0R package for fitting and assessing machine learning models

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Cheminformatics

Lead the way for us

Similar Papers

Machine Learning as an Adjunct to Clinical Decision Making in Alcohol Dependence Treatment
Martyn Symons
-
Martyn SymonsMartyn Symons
01 Jan 2014
01 Jan 2014

Perception without preconception: comparison between the human and machine learner in recognition of tissues from histological sections
Sanghita Barui ... Sharmila Dudani
Scientific Reports | VOL. 12
Sanghita Barui, et. al.Sanghita Barui ... Sharmila Dudani
30 Sep 2022
Scientific Reports | VOL. 12

State-of-the-Art Review of Machine Learning Models in Civil Engineering: Based on DAMIE Classification Tree
Jaehyun Kim ... Donghwi Jung
-
Jaehyun Kim, et. al.Jaehyun Kim ... Donghwi Jung
15 May 2023
15 May 2023

Comparing Machine Learning Models and Hybrid Geostatistical Methods Using Environmental and Soil Covariates for Soil pH Prediction
Panagiotis Tziachris ... Vassilis Aschonitis
ISPRS International Journal of Geo-Information | VOL. 9
Panagiotis Tziachris, et. al.Panagiotis Tziachris ... Vassilis Aschonitis
23 Apr 2020
ISPRS International Journal of Geo-Information | VOL. 9

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Chemmodlab: a cheminformatics modeling laboratory\xa0R package for fitting and assessing machine learning models

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Cheminformatics