Abstract

This paper presents a study of mining frequent itemsets from streaming data in the presence of concept drift. Streaming data, being volatile in nature, is particularly challenging to mine. An approach using genetic algorithms is presented, and various relationships between concept drift, sliding window size, and genetic algorithm constraints are explored. Concept drift is identified by changes in frequent itemsets. The novelty of this work lies in determining concept drift using frequent itemsets for mining streaming data, using the genetic algorithm framework. Formulas have been presented for calculating minimum support counts in streaming data using sliding windows. Testing highlighted that the ratio of the window size to transactions per drift was a key to good performance. Getting good results when the sliding window size was too small was a challenge since normal fluctuations in the data could appear to be a concept drift. Window size must be managed in conjunction with support and confidence values in order to achieve reasonable results. This method of detecting concept drift performed well when larger window sizes were used.

Highlights

  • Today’s digital world is constantly generating data from traffic sensors, health sensors, customer transactions, and various other Internet of Things (IoT) devices

  • Concept drift detection In streaming data, frequent itemsets for a stable concept would be identified by the set of frequent itemsets remaining constant in both number and content, despite data flowing through the window

  • This testing highlighted that the ratio of the window size to transactions per drift was a key to good performance

Read more

Summary

Introduction

Today’s digital world is constantly generating data from traffic sensors, health sensors, customer transactions, and various other Internet of Things (IoT) devices. Continuous never-ending streams of Big Data are creating new sets of challenges from the perspective of data mining. Mining only static data in snapshots of time is no longer useful. Streaming data, being dynamic or volatile in nature, has changing patterns over time, and this is more technically known as concept drift. Algorithms developed for mining streaming data have to be able to detect and work with concept drifts, the need for new streaming data mining approaches. This work looks at an important data mining technique, frequent itemset mining, applied to streaming transaction data, in the presence of concept drift

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call