Abstract
Multi-source stream classification is a prominent real-world problem challenged by the limited real labels and non-stationary environment. Despite growing research achievements in this field, most existing works solved this problem by requiring all real labels of source or target streams to conduct domain adaption or transfer learning mechanisms, which brings high labeling costs. However, in real-world applications, there are usually insufficient labeled data in both source and target streams, and the annotation cost of the source and target streams are generally unequal. Thus, we propose a Cost-Sensitive Active Learning (CSAL) method for multi-source drifting streams. Specifically, a multi-source ensemble framework with an asymmetry weighting mechanism is presented to ensure beneficial knowledge transfer and avoid the negative transfer. Then, a multi-perspective similarity estimation method is proposed to evaluate the similarity of source and target streams. On this basis, a novel cost-sensitive hybrid labeling strategy that combines volatility strategy and uncertainty strategy with a cost-sensitive budget control mechanism is proposed, which adaptively selects representative samples at the appropriate time. At last, a parallel multiple hypothesis drift detection method is proposed, which can efficiently utilize real labels to detect concept drift. Experimental results on real-world and synthetic data streams show that our CSAL outperforms the state-of-the-art methods with even fewer labels.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have