Accumulating regional density dissimilarity for concept drift detection in data streams

Anjin Liu,Jie Lu,Feng Liu,Guangquan Zhang

doi:10.1016/j.patcog.2017.11.009

Anjin Liu, Jie Lu + Show 2 more

Open Access

https://doi.org/10.1016/j.patcog.2017.11.009

Copy DOI

Abstract

In a non-stationary environment, newly received data may have different knowledge patterns from the data used to train learning models. As time passes, a learning model’s performance may become increasingly unreliable. This problem is known as concept drift and is a common issue in real-world domains. Concept drift detection has attracted increasing attention in recent years. However, very few existing methods pay attention to small regional drifts, and their accuracy may vary due to differing statistical significance tests. This paper presents a novel concept drift detection method, based on regional-density estimation, named nearest neighbor-based density variation identification (NN-DVI). It consists of three components. The first is a k-nearest neighbor-based space-partitioning schema (NNPS), which transforms unmeasurable discrete data instances into a set of shared subspaces for density estimation. The second is a distance function that accumulates the density discrepancies in these subspaces and quantifies the overall differences. The third component is a tailored statistical significance test by which the confidence interval of a concept drift can be accurately determined. The distance applied in NN-DVI is sensitive to regional drift and has been proven to follow a normal distribution. As a result, the NN-DVI’s accuracy and false-alarm rate are statistically guaranteed. Additionally, several benchmarks have been used to evaluate the method, including both synthetic and real-world datasets. The overall results show that NN-DVI has better performance in terms of addressing problems related to concept drift-detection.

Highlights

As technology advances, it has become increasingly easier to collect and organize data from different sources
In this research, according to Definition 13, we prove that the dnnps of two i.i.d. sample sets fits a normal distribution
Evaluating the nearest neighbor-based density variation identification (NN-DVI) on Real-world Datasets To demonstrate how our drift detection algorithm improves the performance of learning models in real-world scenarios, we compared our detection method with the other two closely related algorithms, 1) KL [14] and 2) CM [46]), on five benchmark real-world concept drift data sets

Summary

Introduction

It has become increasingly easier to collect and organize data from different sources. The term concept drift in machine learning field refers to a phenomenon in knowledge patterns where data distribution continues to change over time [41]. In real-world scenarios, these types of changes are barely perceptible [46, 45] For this reason, instead of making an assumption in a stationary environment, an effective learning model must always be alert to concept drift, and track and adapt to them quickly [18, 28, 47]. Category 1) methods actively detect concept drifts at every time step and react after confirming a drift They can be further divide into three subcategories [46] : a) data distribution-based drift detection, b) learner output-based drift detection and c) learner parameter-based drift detection.

Literature Review

Preliminary

Nearest Neighbor-based Density Variation Identification

Modelling Data as a Set of High-Resolution Partitions

A Tailored Statistical Significance Test for dnnps

Stream Learning with Nearest Neighbor-based Density Variation Identification

Information Granularity Indicator for NNPS

Experiments and Evaluation

Findings

Conclusions and Further Work

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Pattern Recognition	Publication Date: Nov 7, 2017
Citations: 93	License type: cc-by

R Discovery Prime

R Discovery Prime

Accumulating regional density dissimilarity for concept drift detection in data streams

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Pattern Recognition

Lead the way for us

Similar Papers

Temporal Attention for Few-Shot Concept Drift Detection in Streaming Data
Ximing Lin ... Longtao Chang
Electronics | VOL. 13
Ximing Lin, et. al.Ximing Lin ... Longtao Chang
03 Jun 2024
Electronics | VOL. 13

Detecting concept drift: An information entropy based method using an adaptive sliding window
Lei Du ... Xiaolin Jia
Intelligent Data Analysis | VOL. 18
Lei Du, et. al.Lei Du ... Xiaolin Jia
30 Apr 2014
Intelligent Data Analysis | VOL. 18

Integrated detection and localization of concept drifts in process mining with batch and stream trace clustering support
Rafael Gaspar De Sousa ... Hajo Alexander Reijers
Data & Knowledge Engineering | VOL. 149
Rafael Gaspar De Sousa, et. al.Rafael Gaspar De Sousa ... Hajo Alexander Reijers
02 Dec 2023
Data & Knowledge Engineering | VOL. 149

CD-BTMSE: A Concept Drift detection model based on Bidirectional Temporal Convolutional Network and Multi-Stacking Ensemble learning
Saihua Cai ... Rexford Nii Ayitey Sosu
Knowledge Based Systems | VOL. 294
Saihua Cai, et. al.Saihua Cai ... Rexford Nii Ayitey Sosu
02 Apr 2024
Knowledge Based Systems | VOL. 294

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Accumulating regional density dissimilarity for concept drift detection in data streams

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Pattern Recognition