Abstract
The nature of data streams requires classification algorithms to be real-time, efficient, and able to cope with high-dimensional data that are continuously arriving. It is a known fact that in high-dimensional datasets, not all features are critical for training a classifier. To improve the performance of data stream classification, we propose an algorithm called HEFT-Stream ( H eterogeneous E nsemble with F eature drif T for Data Streams ) that incorporates feature selection into a heterogeneous ensemble to adapt to different types of concept drifts. As an example of the proposed framework, we first modify the FCBF [13] algorithm so that it dynamically update the relevant feature subsets for data streams. Next, a heterogeneous ensemble is constructed based on different online classifiers, including Online Naive Bayes and CVFDT [5]. Empirical results show that our ensemble classifier outperforms state-of-the-art ensemble classifiers (AWE [15] and OnlineBagging [21]) in terms of accuracy, speed, and scalability. The success of HEFT-Stream opens new research directions in understanding the relationship between feature selection techniques and ensemble learning to achieve better classification performance.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have