A two-stage online monitoring procedure for high-dimensional data streams

Jun Li

doi:10.1080/00224065.2018.1507562

Abstract

Advanced computing and data acquisition technologies have made possible the collection of high-dimensional data streams in many fields. Efficient online monitoring tools that can correctly identify any abnormal data stream for such data are highly sought. However, most of the existing monitoring procedures directly apply the false discovery rate (FDR) controlling procedure to the data at each time point, and the FDR at each time point (the pointwise FDR) is either specified by users or determined by the in-control (IC) average run length (ARL). If the pointwise FDR is specified by users, the resulting procedure lacks control of the global FDR and keeps users in the dark in terms of the IC ARL. If the pointwise FDR is determined by the IC ARL, the resulting procedure does not give users the flexibility to choose the number of false alarms (Type-I errors) they can tolerate when identifying abnormal data streams, which often makes the procedure too conservative. To address those limitations, a two-stage monitoring procedure is proposed to control both the IC-ARL and Type-I errors at the levels specified by users. As a result, the proposed procedure allows users to choose not only how often they expect any false alarms when all data streams are IC but also how many false alarms they can tolerate when identifying abnormal data streams. Due to this extra flexibility, the proposed two-stage monitoring procedure is shown to outperform the exiting methods in the simulation study and real data analysis.

Full Text