Abstract

Machine learning applications must continually utilize label information from the data stream to detect concept drift and adapt to the dynamic behavior. Due to the computational expensiveness of label information, it is impractical to assume that the data stream is fully labeled. Therefore, much research focusing on semi-supervised concept drift detection has been proposed. Despite the large research effort in the literature, there is a lack of analysis on the information resources required with the achievable concept drift detection accuracy. Hence, this paper aims to answer the unexplored research question of “How many labeled samples are required to detect concept drift accurately?” by proposing an analytical framework to analyze and estimate the information resources required to detect concept drift accurately. Specifically, this paper disintegrates the distribution-based concept drift detection task into a learning task and a dissimilarity measurement task for independent analyses. The analyses results are then correlated to estimate the required number of labels within a set of data samples to detect concept drift accurately. The proximity of the information resources estimation is evaluated empirically, where the results suggest that the estimation is accurate with high amount of information resources provided. Additionally, estimation results of a state-of-the-art method and a benchmark data set are reported to show the applicability of the estimation by proposed analytical framework within benchmarked environments. In general, the estimation from the proposed analytical framework can serve as guidance in designing systems with limited information resources. This paper also hopes to assist in identifying research gaps and inspiring new research ideas regarding the analysis of the amount of information resources required for accurate concept drift detection.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call