An Improved Machine Learning Model for Pure Component Property Estimation

Xinyu Cao,Anjan Tula,Venkat Venkatasubramanian,Ming Gong,Xi Chen,Rafiqul Gani

doi:10.1016/j.eng.2023.08.024

Abstract

Information on the physicochemical properties of chemical species is an important prerequisite when performing tasks such as process design and product design. However, the lack of extensive data and high experimental costs hinder the development of prediction techniques for these properties. Moreover, accuracy and predictive capabilities still limit the scope and applicability of most property estimation methods. This paper proposes a new Gaussian process-based modeling framework that aims to manage a discrete and high-dimensional input space related to molecular structure representation with the group-contribution approach. A warping function is used to map discrete input into a continuous domain in order to adjust the correlation between different compounds. Prior selection techniques, including prior elicitation and prior predictive checking, are also applied during the building procedure to provide the model with more information from previous research findings. The framework is assessed using datasets of varying sizes for 20 pure component properties. For 18 out of the 20 pure component properties, the new models are found to give improved accuracy and predictive power in comparison with other published models, with and without machine learning.

Full Text