Lifting the limitations of Gaussian mixture regression through coupling with principal component analysis and deep autoencoding

Hiromasa Kaneko

doi:10.1016/j.chemolab.2021.104437

Hiromasa Kaneko

Open Access

https://doi.org/10.1016/j.chemolab.2021.104437

Copy DOI

Journal: Laboratory Automation & Information Management	Publication Date: Oct 9, 2021
Citations: 4	License type: publisher-specific-oa

Affiliation: Meiji University

Abstract

The mathematical modeling of correlations between target properties and their factors of influence, particularly that allowing inverse analysis, is an essential part of molecular, material, and process designs. In contrast to approaches employing pseudo-inverse analysis, Gaussian mixture regression (GMR), which assumes that the relationships between variables can be represented as a mixture of Gaussian distributions, allows for direct inverse analysis. However, as this model optimizes the means and variance–covariance matrices of all variables, parameter estimation becomes increasingly difficult with the increasing number of variables. Herein, this drawback is addressed by the transformation of explanatory variables X into latent variables Z before GMR modeling. As the inverse (Z to X) transformation is necessary for direct inverse analysis, principal component analysis (PCA) and deep autoencoding (DAE) are employed as dimensionality reduction methods. After X is transformed to Z with the help of PCA or DAE, a GMR model is constructed with Z and objective variables Y, and the proposed method is therefore denoted as PCA-GMR or DAE-GMR, respectively. As Z values can be predicted by inputting Y values into the GMR model and can be transformed to X values, direct inverse analysis is also possible. Given that unlabeled data can be employed to construct PCA and DAE models, the proposed methods can also be used as semi-supervised learning techniques. The predictive abilities of PCA-GMR and DAE-GMR are verified using molecular, material, and spectral datasets and surpass that of traditional GMR on all datasets, with the maximum reduction of prediction errors equaling 63%.

Full Text