Abstract

<p>Data stream is the huge amount of data generated in various fields, including financial processes, social media activities, Internet of Things applications, and many others. Such data cannot be processed through traditional data mining algorithms due to several constraints, including limited memory, data speed, and dynamic environment. Concept Drift is known as the main constraint of data stream mining, mainly in the classification task. It refers to the change in the data stream underlining distribution over time. Thus, it results in accuracy deterioration of classification models and wrong predictions. Spam emails, consumer behavior changes, and adversary activates, are examples of Concept Drift. In this paper, a Concept Drift detection model is introduced, Concept Drift Detection Model (CDDM). It monitors the accuracy of the classification model over a sliding window, assuming the decline in accuracy indicates a drift occurrence. A modification over CDDM is a weighted version of the CDDM as W-CDDM.</p><p>Both models have evaluated against two real datasets and four artificial datasets. The experimental results of abrupt drift show that CDDM, W-CDDM outperforms the other models in the dataset of 100K and 1M instances, respectively. Regarding gradual drift, the W-CDDM overtook the rest in terms of accuracy, run time, and detection delays in the dataset of 100 K instances. While in the dataset of 1M instances, CDDM has got the highest accuracy using the NB classifier. Moreover, W-CDDM achieves the highest accuracy on real datasets.</p>

Highlights

  • Internet of things IoTs, weather forecasting, telecommunications systems, and many other applications are examples of data stream applications

  • Concept Drift Detection Model (CDDM) and W-CDDM have evaluated against Fast Hoeffding Drift Detection Method (FHDDM) and MDDM-A through experiments were carried out using four artificial dataset simulating abrupt and gradual concept drifts with different sizes, as well as two real-world datasets

  • W-CDDM was the best in accuracy and had the lowest False Positive (FP) value with both classifiers when the dataset has been increased to 1M instances

Read more

Summary

Introduction

Internet of things IoTs, weather forecasting, telecommunications systems, and many other applications are examples of data stream applications. The utilization of these types of applications has resulted in the vast open-ended stream of data. The generated data needs to be analyzed and, further, extract knowledge from the data repository for different purposes. In general, is the phenomenon that deteriorates the accuracy of a learning model It can be handled in the classification models through different approaches, such as tracking the data probability distribution, controlling the model accuracy, or monitoring features changes [2]

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call