Abstract

Abstract Concept drift detection plays a very important role in the context of data streams. It allows to point out data behavior modifications along time, which are intrinsically associated to the phenomena responsible for producing such sequences of observations. By detecting such modifications, one can better understand those phenomena and take better decisions in different application domains, e.g. stock market, climate change, population growth, etc. Besides several proposals, most of the studies lack in formal guarantees to ensure the concept drift detection. More recently, Vallim and Mello proposed 1DFT (Unidimensional Fourier Transform), an algorithm that detects drifts on unidimensional streams while holding a stability property based on surrogate series. Motivated by their work we here propose the multidimensional surrogate stability concept, which extends their approach to multidimensional data streams. In addition, our approach, named MDFT (Multidimensional Fourier Transform), employs a different and more robust measurement to analyze drifts, which is based on the Shannon’s and Von Neumann’s Entropies to quantify variations in data spaces. As final contribution, MDFT allows unidimensional streams to be reconstructed in phase spaces so their data dependencies can also be analyzed to take conclusions on concept drifts along time. Experiments considered seven 120,000-observation synthetic data streams. Synthetic data was taken into account as it allows us to define the exact points of change, using the largest Lyapunov exponent, for which our approach should trigger the concept drift events. Experiments compared MDFT against the main algorithms to detect concept drift in the context of Machine Learning (Page-Hinkley Test – PHT, Adaptive Windowing – ADWIN, and Cumulative Sum Control Chart – CUSUM) and Dynamical Systems (Recurrence Quantification Analysis using different measurements – RQA, and Permutation Entropy – PE). Results confirm MDFT outperforms the other algorithms in terms of an average measurement (using the Euclidean distance) based on: the Missed Detection Rate (MDR), the Mean Time for Detection (MTD) and the Mean Time between False Alarms (MTFA).

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call