Transfer learning for molecular property predictions from small datasets

Thorren Kirschbaum,Annika Bande

doi:10.1063/5.0214754

Abstract

Machine learning has emerged as a new tool in chemistry to bypass expensive experiments or quantum-chemical calculations, for example, in high-throughput screening applications. However, many machine learning studies rely on small datasets, making it difficult to efficiently implement powerful deep learning architectures such as message passing neural networks. In this study, we benchmark common machine learning models for the prediction of molecular properties on two small datasets, for which the best results are obtained with the message passing neural network PaiNN as well as SOAP molecular descriptors concatenated to a set of simple molecular descriptors tailored to gradient boosting with regression trees. To further improve the predictive capabilities of PaiNN, we present a transfer learning strategy that uses large datasets to pre-train the respective models and allows us to obtain more accurate models after fine-tuning on the original datasets. The pre-training labels are obtained from computationally cheap ab initio or semi-empirical models, and both datasets are normalized to mean zero and standard deviation one to align the labels’ distributions. This study covers two small chemistry datasets, the Harvard Organic Photovoltaics dataset (HOPV, HOMO–LUMO-gaps), for which excellent results are obtained, and the FreeSolv dataset (solvation energies), where this method is less successful, probably due to a complex underlying learning task and the dissimilar methods used to obtain pre-training and fine-tuning labels. Finally, we find that for the HOPV dataset, the final training results do not improve monotonically with the size of the pre-training dataset, but pre-training with fewer data points can lead to more biased pre-trained models and higher accuracy after fine-tuning.

Full Text

Published Version

View

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

Transfer learning for molecular property predictions from small datasets

Abstract

Published Version

Talk to us

Similar Papers

More From: AIP Advances

Lead the way for us

Journal: AIP Advances	Publication Date: Oct 1, 2024
License type: cc-by

Similar Papers

Transfer learning-based fault location with small datasets in VSC-HVDC
Boyang Shang ... Jiaxin Hei
International Journal of Electrical Power & Energy Systems | VOL. 151
Boyang Shang, et. al.Boyang Shang ... Jiaxin Hei
13 Apr 2023
International Journal of Electrical Power & Energy Systems | VOL. 151

Small training dataset convolutional neural networks for application-specific super-resolution microscopy.
Varun Mannam ... Scott Howard
Journal of biomedical optics | VOL. 28
Varun Mannam, et. al.Varun Mannam ... Scott Howard
14 Mar 2023
Journal of biomedical optics | VOL. 28

Making the most of small Software Engineering datasets with modern machine learning
Julian Aron Aron Prenner ... Romain Robbes
IEEE Transactions on Software Engineering | VOL. -
Julian Aron Aron Prenner, et. al.Julian Aron Aron Prenner ... Romain Robbes
01 Jan 2021
IEEE Transactions on Software Engineering | VOL. -

Abstract LB396: The power of NetraAI: Precision medicine in oncology through sub-insight learning from small data sets
Bessi Qorri ... Paul Leonchyk
Cancer Research | VOL. 84
Bessi Qorri, et. al.Bessi Qorri ... Paul Leonchyk
05 Apr 2024
Cancer Research | VOL. 84

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

Transfer learning for molecular property predictions from small datasets

Abstract

Published Version

Talk to us

Similar Papers

More From: AIP Advances