Abstract

Distribution of data stream is always changed in the real world. This problem is usually defined as concept drift <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">[1]</sup> . The state-of-the-art decision tree classification method CVFDT <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">[2]</sup> can solve the concept drift problem well, but the efficiency is debased because of its general method of handling instances in CVFDT without considering the types of concept drift. In this paper, an algorithm called Efficient CVFDT (E-CVFDT) is proposed to improve the efficiency of CVFDT. E-CVFDT introduces cache mechanism and treats the instances in three kinds of concept drift respectively, i.e. accidental concept drift, gradual concept drift, instantaneously concept drift. Besides, in E-CVFDT, the cached instances which have similar attributes will be sent in batches to calculate the information gain calculation rather than in sequence adopted by CVFDT. The experiments are carried out on the MOA platform. The results show that E-CVFDT algorithm achieves not only better efficiency but also higher accuracy than CVFDT algorithm.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call