Addressing feature drift in data streams using iterative subset selection

Lanqin Yuan,Bernhard Pfahringer,Jean Paul Barddal

doi:10.1145/3325061.3325063

Abstract

Data streams are prone to various forms of concept drift over time including, for instance, changes to the relevance of features. This specific kind of drift is known as feature drift and requires techniques tailored not only to determine which features are the most important but also to take advantage of them. Feature selection has been studied and shown to improve classifier performance in standard batch data mining, yet it is mostly unexplored in data stream mining. This paper presents a novel method of feature subset selection specialized for dealing with the occurrence of feature drifts called Iterative Subset Selection (ISS), which splits the feature selection process into two stages by first ranking the features using some scoring function, and then iteratively selecting feature subsets using this ranking. This work further extends upon our prior work by exploring feeding information from the subset selection stage back into the ranking process. Applying our method to the Naïve Bayes and k-Nearest Neighbour classifier, we obtain compelling accuracy improvements when compared to existing works.

Full Text