Simpler is Better: How Linear Prediction Tasks Improve Transfer Learning in Chemical Autoencoders.

Nicolae C Iovanac,Brett M Savoie

doi:10.1021/acs.jpca.0c00042

Abstract

Transfer learning is a subfield of machine learning that leverages proficiency in one or more prediction tasks to improve proficiency in a related task. For chemical property prediction, transfer learning models represent a promising approach for addressing the data scarcity limitations of many properties by utilizing potentially abundant data from one or more adjacent applications. Transfer learning models typically utilize a latent variable that is common to several prediction tasks and provides a mechanism for information exchange between tasks. For chemical applications, it is still largely unknown how correlation between the prediction tasks affects performance, the limitations on the number of tasks that can be simultaneously trained in these models before incurring performance degradation, and if transfer learning positively or negatively affects ancillary model properties. Here we investigate these questions using an autoencoder latent space as a latent variable for transfer learning models for predicting properties from the QM9 data set that have been supplemented with semiempirical quantum chemistry calculations. We demonstrate that property prediction can be counterintuitively improved by utilizing a simpler linear predictor model, which has the effect of forcing the latent space to organize linearly with respect to each property. In data scarce prediction tasks, the transfer learning improvement is dramatic, whereas in data rich prediction tasks, there appears to be little adverse impact of transfer learning on prediction performance. The transfer learning approach demonstrated here thus represents a highly advantageous supplement to property prediction models with no downside in implementation.

Full Text