Gaussian Process Latent Variable Model-Based Multi-Output Modeling of Incomplete Data

Dongping Du,Jianguo Wu,Chao Wang,Zhiyong Hu

doi:10.1109/tase.2023.3251386

Abstract

The rapid development of sensor technologies allows the acquisition of high dimensional sensing data. Multi-output modeling techniques have been developed to leverage the data for decision making. However, the data often contain segments of missing values, which cause great information loss and thus affect the modeling performance. This study explores the missing pattern and the correlation structure of missing segments and maximally exploits useful information in the data to improve multi-output modeling accuracy. Specifically, a new multi-output modeling method is developed based on Gaussian Process Latent Variable Model (GPLVM). A decision score is developed to seek an optimal modeling strategy and then a tailored Expectation-Maximization (EM) algorithm based on GPLVM is designed to estimate the missing segments while optimizing model parameters. The proposed method demonstrates superior performance in both a simulation study and a case study, which makes it a powerful tool to enable process automation. <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Note to Practitioners</i> —In real-life applications, missing values are constantly present in multi-output sensing data, which greatly affects data-driven decision-making. Modeling of such data becomes more difficult when consecutive observations within or across different outputs are missing. Existing methods often discard the missing values and extract information only from the available observations. However, the pattern of missing values may contain important messages that can potentially boost the modeling performance. This research develops a new framework based on GPLVM to model multi-output data with segmented missing patterns. A tailored EM algorithm is developed to iteratively impute the missing values and optimize model parameters. In addition, a decision score that quantifies both the missing pattern and correlation is designed to determine an optimal modeling strategy. The proposed method can benefit many applications across different industries that require modeling of multi-output incomplete data, especially when the data have many segments of missing observations.

Full Text