Optimal Compounds Discovery by Design of Experiments and Algorithmic Evolution of Linear Models

Raphaël Dumeunier,Florent A Hascher

doi:10.2533/chimia.2013.71

Abstract

Based on the premise that, for a given class of related chemical compounds, there exists a relationship between their structure and their properties (i.e. activity), it is demonstrated herein that an elementary algorithm can readily identify, with simplistic models and without recourse to molecular descriptors, the most active compounds of a categorical, pre-defined space of molecules. In an actual case study using public experimental data on two thousand related molecules, D-optimal design of experiments initially identified the best subset of compounds considered for the construction of simple models. Subsequently, predictions of a first generation of best candidates, their preparation and inclusion into a new data set, allowed the exploration of the most active region within the space of interest. Survival of the algorithm by iterative generations ensured that most of the best (active) compounds had been prepared. A certain partial survival condition, followed by a complete termination criterion, helped to minimize the total amount of compounds to prepare while identifying the n best individuals of the matrix.

Full Text