Low-Error Data Recovery Based on Collaborative Filtering With Nonlinear Inequality Constraints for Manufacturing Processes

Bo-Wei Chen,Wei-Cheng Ye

doi:10.1109/tase.2020.3012426

Abstract

This study proposes a data recovery model where substituted values can be further limited by nonlinear and inequality constraints to approximate the ground truth. The objective is to generate substituted values for multifactors while considering their lower/upper bounds, data means, and nonlinearity at the same time. This is critical when data need to fall inside a nonlinear range, e.g., a partial hypersphere centered at a given mean. In view of such, this study proposes collaborative filtering with nonlinear inequality constraints to tackle the problem. The proposed method consists of three steps. First, the system finds class-dependent and box-bounded imputation basis factors for an incomplete data set. Class-dependent bases can reflect data domains well. Second, class-dependent imputation coefficients are located by the proposed nonnegative coefficient discovery with nonlinear inequality constraints. This step limits searching space and avoids generating substituted values out of range. Finally, constrained iterative projection pursuit is proposed for measuring the quality of recovered data by examining reconstruction residuals. By using both the nonlinear inequality constraints and the constrained iterative projection pursuit, the system can recover data while satisfying multifactor nonlinear coeffects required by manufacturers. Experimental results showed that the proposed method was capable of generating substituted values with lower root-mean-squared errors. In addition, errors were reduced by at least 10.06% on average, better than those of the baselines. Furthermore, the classification accuracy of the proposed method after data imputation was higher than that of the baselines. Such findings indicated that the proposed method could approximate the characteristics of data when missing values appeared. <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Note to Practitioners</i> —This work was motivated by the problem of missing values in industrial heterogeneous sensor readings. When data recovery is performed, the process should consider multifactor nonlinearity, lower/upper bounds, historical references, and the divergence between substituted values and references at the same time in order to reconstruct original sensor readings as many as possible. Existing approaches generally have solutions to linear or nonlinear equality constraints but not the aforementioned nonlinear inequality ones. This work designs a self-dictionary method—class-dependent and box-bounded imputation basis factors along with constrained iterative projection pursuit—for finding substituted values. Real industrial experiments were conducted based on <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$\mathcal {L}_{2}$ </tex-math></inline-formula> -norms. Future research will address the design for other norms.

Full Text