Abstract

In traditional data stream mining, classification models are typically trained on labeled samples from a single source. However, in real-world scenarios, obtaining accurate labels is very hard and expensive, especially when multiple data streams are concurrently sampled from an environment or the same process. To address this issue, multistream classification is proposed, in which a data stream with biased labels (called the source stream) is leveraged to train a suitable model for prediction over another stream with unlabeled samples (called the target stream). Despite the growing research in this field, previous multistream classification methods are mostly designed for single source stream scenarios. However, various source streams contain diverse data distributions, providing more valuable information for building a more accurate model. In addition, previous works construct classification models in the original shared feature space, ignoring the effect of redundant or low-quality features on the classification performance. This may produce inefficient knowledge transfer across streams. In view of this, a reduced-space multistream classification based on multi-objective evolutionary optimization is proposed in this paper. First, a multi-objective evolutionary optimization is employed to seek the most valuable feature subset shared in the source and target domains, with the purpose of narrowing the distribution difference between source and target streams. Following that, a Gaussian Mixture Model-based weighting mechanism for source samples is presented. More especially, two drift adaptation methods are proposed to address asynchronous drift. Experimental results on benchmark datasets show that the proposed method outperforms other comparative methods on classification accuracy and G-mean.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call