Abstract

With the rapid advancement of in-process measurements and sensor technology driven by zero-defect manufacturing applications, high-dimensional heterogeneous processes that continuously collect distinct physical characteristics frequently appear in modern industries. Such large-volume high-dimensional data place a heavy demand on data collection, transmission, and analysis in practice. Thus, practitioners often need to decide which informative data streams to observe given the resource constraints at each data acquisition time, which poses significant challenges for multivariate statistical process control and quality improvement. In this article, we propose a generic online nonparametric monitoring and sampling scheme to quickly detect mean shifts occurring in heterogeneous processes when only partial observations are available at each acquisition time. Our innovative idea is to seamlessly integrate the Thompson sampling (TS) algorithm with a quantile-based nonparametric cumulative sum (CUSUM) procedure to construct local statistics of all data streams based on the partially observed data. Furthermore, we develop a global monitoring scheme of using the sum of top- <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">${r}$ </tex-math></inline-formula> local statistics, which can quickly detect a wide range of possible mean shifts. Tailored to monitoring the heterogeneous data streams, the proposed method balances between exploration that searches unobserved data streams for possible mean shifts and exploitation that focuses on highly suspicious data streams for quick shift detection. Both simulations and a case study are comprehensively conducted to evaluate the performance and demonstrate the superiority of the proposed method. <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Note to Practitioners</i> —This paper is motivated by the critical challenge of online process monitoring by considering the cost-effectiveness and resource constraints in practice (e.g., limited number of sensors, limited transmission bandwidth or energy constraint, and limited processing time). Unlike the existing methodologies which rely on the restrictive assumptions (e.g., normally distributed, exchangeable data streams) or require historical full observations of all data streams to be available offline for training, this paper proposes a novel monitoring and sampling strategy that allows the practitioners to cost-effectively monitor high-dimensional heterogeneous data streams that contain distinct physical characteristics and follow different distributions. To implement the proposed methodology, it is necessary: (i) to identify sample quantiles for each data stream based on historical in-control data offline; (ii) to determine which data streams to observe at each acquisition time based on the resource constraints; and (iii) to automatically screen out the suspicious data streams to form the global monitoring statistic. Experimental results through simulations and a case study have shown that the proposed method has much better performance than the existing methods in reducing detection delay and effectively dealing with heterogeneous data streams.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call