PREFER: A New Predictive Modeling Framework for Molecular Discovery.

Jessica Lanini,Nikolas Fechner,Hubert Misztela,Sarah Lewis,Nikolaus Stiefl,Nadine Schneider,Richard Lewis,Finton Sirockin,Krzysztof Maziarz,Megan Stanley,Marwin Segler,Gianluca Santarossa

doi:10.1021/acs.jcim.3c00523

Abstract

Machine-learning and deep-learning models have been extensively used in cheminformatics to predict molecular properties, to reduce the need for direct measurements, and to accelerate compound prioritization. However, different setups and frameworks and the large number of molecular representations make it difficult to properly evaluate, reproduce, and compare them. Here we present a new PREdictive modeling FramEwoRk for molecular discovery (PREFER), written in Python (version 3.7.7) and based on AutoSklearn (version 0.14.7), that allows comparison between different molecular representations and common machine-learning models. We provide an overview of the design of our framework and show exemplary use cases and results of several representation-model combinations on diverse data sets, both public and in-house. Finally, we discuss the use of PREFER on small data sets. The code of the framework is freely available on GitHub.

Full Text