Modern industrial systems routinely generate data in high volume at high velocity. These high-dimensional data streams (HDDS) provide valuable information at granular levels to quality personnel during root cause investigation in cases of a system fault. The goal of fault analysis using HDDS is twofold: (1) identify abnormal data streams and (2) locate the change point when the processes become out of control. Existing research has largely focused on addressing the two issues separately. In this article, we propose a unified framework by formulating the problem as optimal control of hierarchical missed discovery rates in multiple classifications. Theoretically, we establish that our approach minimizes the number of false discoveries while controlling the missed discovery rates at desired levels. Numerically, we develop a computationally efficient algorithm for solving the optimization and demonstrate its superior performance over the existing methods. A data-driven version of the proposed approach is suggested as well. An application to a real data set in semiconductor manufacturing shows that our approach works well in practice.
Read full abstract