Abstract

Data stream mining deals with processing large amounts of data in nonstationary environments, where the relationship between the data and the labels often changes. Such dynamic relationships make it difficult to design a computationally efficient data stream processing algorithm that is also adaptable to the nonstationarity of the environment. To make the algorithm adaptable to the nonstationarity of the environment, concept drift detectors are attached to detect the changes in the environment by monitoring the error rates and adapting to the environment’s current state. Unfortunately, current approaches to adapt to environmental changes assume that the data stream is fully labeled. Assuming a fully labeled data stream is a flawed assumption as the labeling effort would be too impractical due to the rapid arrival and volume of the data. To address this issue, this study proposes to detect concept drift by anticipating a possible change in the true label in the high confidence prediction region. This study also proposes an ensemble-based concept drift adaptation approach that transfers reliable classifiers to the new concept. The significance of our proposed approach compared to the current baselines is that our approach does not use a performance measur as the drift signal or assume a change in data distribution when concept drift occurs. As a result, our proposed approach can detect concept drift when labeled data are scarce, even when the data distribution remains static. Based on the results, this proposed approach can detect concept drifts and fully supervised data stream mining approaches and performs well on mixed-severity concept drift datasets.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call