Efficient Mining of Data Streams Using Associative Classification Approach

Prasanna Lakshmi Kompalli,Ramesh Kumar Cherku

doi:10.1142/s0218194015500059

Abstract

Data stream associative classification poses many challenges to the data mining community. In this paper, we address four major challenges posed, namely, infinite length, extraction of knowledge with single scan, processing time, and accuracy. Since data streams are infinite in length, it is impractical to store and use all the historical data for training. Mining such streaming data for knowledge acquisition is a unique opportunity and even a tough task. A streaming algorithm must scan data once and extract knowledge. While mining data streams, processing time, and accuracy have become two important aspects. In this paper, we propose PSTMiner which considers the nature of data streams and provides an efficient classifier for predicting the class label of real data streams. It has greater potential when compared with many existing classification techniques. Additionally, we propose a compact novel tree structure called PSTree (Prefix Streaming Tree) for storing data. Extensive experiments conducted on 24 real datasets from UCI repository and synthetic datasets from MOA (Massive Online Analysis) show that PSTMiner is consistent. Empirical results show that performance of PSTMiner is highly competitive in terms of accuracy and performance time when compared with other approaches under windowed streaming model.

Full Text