Abstract

Data streams are expected to undergo changes in data distribution, a phenomenon called concept drift. Another closely related phenomenon is the feature drift of data streams. Feature drifts occur whenever a subset of features becomes, or ceases to be, relevant to the learning task. Identifying the most relevant feature subset from a high-dimensional feature space is challenging in the stream mining scenario. In this study, we propose an online dynamic feature weighting algorithm. Specifically, a feature drift detection scheme is introduced that monitors the changes in the class relevance of the features through a change-detection algorithm based on the log-likelihood divergence score. The score is computed via the kernel density estimator based on the information-theoretic feature merit values. The algorithm is evaluated on both synthetic and real-world datasets, and it is shown that the proposed distribution-based drift detection framework can boost the Nearest Neighbor and Naive Bayes classifier accuracy rates (an average of 2.7% for Nearest Neighbor and 5.5% for Naive Bayes). It also signals feature drifts much faster than traditional methods based on detecting changes in accuracy rates. Finally, the limitations of the proposed method are assessed, and future research directions are discussed.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call