Machine Learning Distinguishes with High Accuracy between Pan-Assay Interference Compounds That Are Promiscuous or Represent Dark Chemical Matter

Swarit Jasial,Erik Gilberg,Jürgen Bajorath,Thomas Blaschke

doi:10.1021/acs.jmedchem.8b01404

Abstract

Assay interference compounds give rise to false-positives and cause substantial problems in medicinal chemistry. Nearly 500 compound classes have been designated as pan-assay interference compounds (PAINS), which typically occur as substructures in other molecules. The structural environment of PAINS substructures is likely to play an important role for their potential reactivity. Given the large number of PAINS and their highly variable structural contexts, it is difficult to study context dependence on the basis of expert knowledge. Hence, we applied machine learning to predict PAINS that are promiscuous and distinguish them from others that are mostly inactive. Surprisingly accurate models can be derived using different methods such as support vector machines, random forests, or deep neural networks. Moreover, structural features that favor correct predictions have been identified, mapped, and categorized, shedding light on the structural context dependence of PAINS effects. The machine learning models presented herein further extend the capacity of PAINS filters.

Full Text