Computational prediction of frequent hitters in target-based and cell-based assays

Conrad Stork,Neann Mathai,Johannes Kirchmair

doi:10.1016/j.ailsci.2021.100007

Abstract

Compounds interfering with high-throughput screening (HTS) assay technologies (also known as “badly behaving compounds”, “bad actors”, “nuisance compounds” or “PAINS”) pose a major challenge to early-stage drug discovery. Many of these problematic compounds are “frequent hitters”, and we have recently published a set of machine learning models (“Hit Dexter 2.0”) for flagging such compounds.Here we present a new generation of machine learning models which are derived from a large, manually curated and annotated data set. For the first time, these models cover, in addition to target-based assays, also cell-based assays. Our experiments show that cell-based assays behave indeed differently from target-based assays, with respect to hit rates and frequent hitters, and that dedicated models are required to produce meaningful predictions. In addition to these extensions and refinements, we explored a variety of additional setups for modeling, including the combination of four machine learning classifiers (i.e. k-nearest neighbors (KNN), extra trees, random forest and multilayer perceptron) with four sets of descriptors (Morgan2 fingerprints, Morgan3 fingerprints, MACCS keys and 2D physicochemical property descriptors).Testing on holdout data as well as data sets of “dark chemical matter” (i.e. compounds that have been extensively tested in biological assays but have never shown activity) and known bad actors show that the multilayer perceptron classifiers in combination with Morgan2 fingerprints outperform other setups in most cases. The best multilayer perceptron classifiers obtained Matthews correlation coefficients of up to 0.648 on holdout data. These models are available via a free web service.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Artificial Intelligence in the Life Sciences	Publication Date: Aug 8, 2021
Citations: 4	License type: cc-by

R Discovery Prime

R Discovery Prime

Computational prediction of frequent hitters in target-based and cell-based assays

Abstract

Talk to us

Similar Papers

More From: Artificial Intelligence in the Life Sciences

Lead the way for us

Similar Papers

Hit Dexter 2.0: Machine-Learning Models for the Prediction of Frequent Hitters.
Conrad Stork ... Ya Chen
Journal of Chemical Information and Modeling | VOL. 59
Conrad Stork, et. al.Conrad Stork ... Ya Chen
09 Jan 2019
Journal of Chemical Information and Modeling | VOL. 59

17 Chemical Genomic Tools for Understanding Gene Function and Drug Action
Corey Nislow ... Guri Giaever
Methods in Microbiology | VOL. 36
Corey Nislow, et. al.Corey Nislow ... Guri Giaever
01 Jan 2007
Methods in Microbiology | VOL. 36

Coast type based accuracy assessment for coastline extraction from satellite image with machine learning classifiers
Osman İsa Çelik ... Cem Gazioğlu
The Egyptian Journal of Remote Sensing and Space Science | VOL. 25
Osman İsa Çelik, et. al.Osman İsa Çelik ... Cem Gazioğlu
01 Feb 2022
The Egyptian Journal of Remote Sensing and Space Science | VOL. 25

Understanding False Positives in Reporter Gene Assays: in Silico Chemogenomics Approaches To Prioritize Cell-Based HTS Data
Thomas J Crisman ... John W Davies
Journal of Chemical Information and Modeling | VOL. 47
Thomas J Crisman, et. al.Thomas J Crisman ... John W Davies
01 Jul 2007
Journal of Chemical Information and Modeling | VOL. 47

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Computational prediction of frequent hitters in target-based and cell-based assays

Abstract

Talk to us

Similar Papers

More From: Artificial Intelligence in the Life Sciences