Abstract
With the rapid development of data-driven technologies, a massive amount of actual data emerges from industrial systems, forming data stream. Their data distribution may change over time and outliers may be generated as unbalanced imperfect data due to time-varying working condition, aging equipment, etc. Previous methods struggle with the dual challenges of concept drift and unbalance, however, fail to efficiently distinguishing outliers from a drift under the limited labeling budget, causing the performance degradation. To address the issue, robust online active learning with cluster-based local drift detection is proposed to classify unbalanced imperfect data stream with the above characteristics. The cluster-based local drift detection is first designed to capture a new concept and recognize the corresponding drifted regions. Following that, an improved active learning mechanism is presented to distinguish outliers from a drift, and select most valuable instances for labeling and updating ensemble classifier. Experimental results for eight synthetic and four real-world data streams show that the proposed method outperforms seven comparative methods on classification accuracy and robustness.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.