Abstract

In a non-stationary environment, newly received data may have different knowledge patterns from the data used to train learning models. As time passes, a learning model’s performance may become increasingly unreliable. This problem is known as concept drift and is a common issue in real-world domains. Concept drift detection has attracted increasing attention in recent years. However, very few existing methods pay attention to small regional drifts, and their accuracy may vary due to differing statistical significance tests. This paper presents a novel concept drift detection method, based on regional-density estimation, named nearest neighbor-based density variation identification (NN-DVI). It consists of three components. The first is a k-nearest neighbor-based space-partitioning schema (NNPS), which transforms unmeasurable discrete data instances into a set of shared subspaces for density estimation. The second is a distance function that accumulates the density discrepancies in these subspaces and quantifies the overall differences. The third component is a tailored statistical significance test by which the confidence interval of a concept drift can be accurately determined. The distance applied in NN-DVI is sensitive to regional drift and has been proven to follow a normal distribution. As a result, the NN-DVI’s accuracy and false-alarm rate are statistically guaranteed. Additionally, several benchmarks have been used to evaluate the method, including both synthetic and real-world datasets. The overall results show that NN-DVI has better performance in terms of addressing problems related to concept drift-detection.

Highlights

  • As technology advances, it has become increasingly easier to collect and organize data from different sources

  • In this research, according to Definition 13, we prove that the dnnps of two i.i.d. sample sets fits a normal distribution

  • Evaluating the nearest neighbor-based density variation identification (NN-DVI) on Real-world Datasets To demonstrate how our drift detection algorithm improves the performance of learning models in real-world scenarios, we compared our detection method with the other two closely related algorithms, 1) KL [14] and 2) CM [46]), on five benchmark real-world concept drift data sets

Read more

Summary

Introduction

It has become increasingly easier to collect and organize data from different sources. The term concept drift in machine learning field refers to a phenomenon in knowledge patterns where data distribution continues to change over time [41]. In real-world scenarios, these types of changes are barely perceptible [46, 45] For this reason, instead of making an assumption in a stationary environment, an effective learning model must always be alert to concept drift, and track and adapt to them quickly [18, 28, 47]. Category 1) methods actively detect concept drifts at every time step and react after confirming a drift They can be further divide into three subcategories [46] : a) data distribution-based drift detection, b) learner output-based drift detection and c) learner parameter-based drift detection.

Literature Review
Preliminary
Nearest Neighbor-based Density Variation Identification
Modelling Data as a Set of High-Resolution Partitions
A Tailored Statistical Significance Test for dnnps
Stream Learning with Nearest Neighbor-based Density Variation Identification
Information Granularity Indicator for NNPS
Experiments and Evaluation
Findings
Conclusions and Further Work
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call