Abstract
In a streaming environment, the characteristics of the data themselves and their relationship with the labels are likely to experience changes as time goes on. Most drift detection methods for supervised data streams are performance-based, that is, they detect changes only after the classication accuracy deteriorates. This may not be sufcient in many application areas where the reason behind a drift is also important. Another category of drift detectors are data distribution-based detectors. Although they can detect some drifts within the input space, changes affecting only the labelling mechanism cannot be identied. Furthermore, little work is available on drift detection for high-dimensional supervised data streams. In this paper we propose an advanced Hierarchical Reduced-space Drift Detection Framework for Supervised Data Streams (HRDS) which captures drifts regardless of their effects on classication performance. This framework suggests monitoring both marginal and class-conditional distributions within a lower-dimensional space specically relevant to the assigned classication task. Experimental comparisons have demonstrated that the proposed HRDS not only achieves high-quality performance on high-dimensional data streams, but also outperforms its competitors in terms of detection recall, precision and F-measure across a wide range of different concept drift types including subtle drifts.
Highlights
I N real-world applications such as weather prediction, industrial quality control and fraud detection, data often arrives in the form of a stream
Hierarchical change detection test (HCDT) has been shown to achieve more advantageous false positive rate (FPR) versus detection delay (DD) trade-off than its single change detection tests (CDTs) counterpart, but it has only been tested on nonlabelled scalar data [21]
We provide one possible realization for a binary classification problem as an illustrative example in this paper, it is worth noting that the general framework of Hierarchical Reduced-space Drift Detection (HRDD) is suitable for multi-class data streams
Summary
I N real-world applications such as weather prediction, industrial quality control and fraud detection, data often arrives in the form of a stream. HCDT has been shown to achieve more advantageous false positive rate (FPR) versus detection delay (DD) trade-off than its single CDT counterpart, but it has only been tested on nonlabelled scalar data [21] Direct application of this framework to multivariate supervised data streams still suffers from the aforementioned deficiencies of distribution-based detectors. The contributions of our work include: 1) A new hierarchical detection framework proposed for supervised data streams that detects both real and virtual drifts. We provide one possible realization for a binary classification problem as an illustrative example in this paper, it is worth noting that the general framework of HRDD is suitable for multi-class data streams
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: IEEE Transactions on Knowledge and Data Engineering
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.