Cautionary Guidelines for Machine Learning Studies with Combinatorial Datasets.

Andrew F Zahrt,Jeremy J Henle,Scott E Denmark

doi:10.1021/acscombsci.0c00118

Abstract

Regression modeling is becoming increasingly prevalent in organic chemistry as a tool for reaction outcome prediction and mechanistic interrogation. Frequently, to acquire the requisite amount of data for such studies, researchers employ combinatorial datasets to maximize the number of data points while limiting the number of discrete chemical entities required. An often-overlooked problem in modeling studies using combinatorial datasets is the tendency to fit on patterns in the datasets (i.e., the presence or absence of a reactant or catalyst) rather than to identify meaningful trends between descriptors and the response variable. Consequently, the generality and interpretability of such models suffer. This report illustrates these well-known pitfalls in a case study, demonstrates the necessary control experiments to identify when this property will be problematic, and suggests how to perform further validation to assess general applicability and interpretability of models trained using combinatorial datasets.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Cautionary Guidelines for Machine Learning Studies with Combinatorial Datasets.

Abstract

Talk to us

Similar Papers

More From: ACS Combinatorial Science

Lead the way for us

Journal: ACS Combinatorial Science	Publication Date: Oct 1, 2020
Citations: 25

Similar Papers

Can We Predict Functional Outcome in Neonates with Hypoxic Ischemic Encephalopathy by the Combination of Neuroimaging and Electroencephalography?
Tania Nanavati ... Paola Pergami
Pediatrics & Neonatology | VOL. 56
Tania Nanavati, et. al.Tania Nanavati ... Paola Pergami
07 Feb 2015
Pediatrics & Neonatology | VOL. 56

Volume-Based Parameters of 18F-Fluorodeoxyglucose Positron Emission Tomography/Computed Tomography Improve Outcome Prediction in Early-Stage Non–Small Cell Lung Cancer After Surgical Resection
Seung Hyup Hyun ... Young Mog Shim
Annals of Surgery | VOL. 257
Seung Hyup Hyun, et. al.Seung Hyup Hyun ... Young Mog Shim
01 Feb 2013
Annals of Surgery | VOL. 257

Implementation of the Surgical Apgar Score in Laboratory Animal Science: A Showcase Pilot Study in a Porcine Model and a Review of the Literature
Lisa Ernst ... Mareike Schulz
European Surgical Research | VOL. 64
Lisa Ernst, et. al.Lisa Ernst ... Mareike Schulz
13 Dec 2021
European Surgical Research | VOL. 64

The Development of Multidimensional Analysis Tools for Asymmetric Catalysis and Beyond.
Matthew S Sigman ... Elizabeth N Bess
Accounts of Chemical Research | VOL. 49
Matthew S Sigman, et. al.Matthew S Sigman ... Elizabeth N Bess
24 May 2016
Accounts of Chemical Research | VOL. 49

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Cautionary Guidelines for Machine Learning Studies with Combinatorial Datasets.

Abstract

Talk to us

Similar Papers

More From: ACS Combinatorial Science