With the rapid development of data-driven technologies, a massive amount of actual data emerges from industrial systems, forming data stream. Their data distribution may change over time and outliers may be generated as unbalanced imperfect data due to time-varying working condition, aging equipment, etc. Previous methods struggle with the dual challenges of concept drift and unbalance, however, fail to efficiently distinguishing outliers from a drift under the limited labeling budget, causing the performance degradation. To address the issue, robust online active learning with cluster-based local drift detection is proposed to classify unbalanced imperfect data stream with the above characteristics. The cluster-based local drift detection is first designed to capture a new concept and recognize the corresponding drifted regions. Following that, an improved active learning mechanism is presented to distinguish outliers from a drift, and select most valuable instances for labeling and updating ensemble classifier. Experimental results for eight synthetic and four real-world data streams show that the proposed method outperforms seven comparative methods on classification accuracy and robustness.
Read full abstract