Abstract

This paper considers a practical scenario—data stream classification, and uses online active learning to assist the classifier. Most active learning methods are proposed based on label uncertainty, but the methods based on representativeness are faced with difficulties in stream scenarios. The sampling criteria based on representativeness are usually fixed, which makes it difficult for them to work effectively in dynamic sample spaces, especially for data streams with concept drift. This paper proposes a novel online active learning framework based on sample representativeness. The framework uses local nearest-neighbor information to measure the representativeness of unlabeled samples and divides the maximum influence space of the representative sample. We also develop an independent mechanism to identify and store short-term cluster fragments to ensure the integrity of information. The framework is deployed to the online environment and adapts to any incremental learner. Simulation experiments on multiple datasets and classifiers show that our algorithm can improve the performance of the primary classifier. Compared with other online active learning, the proposed method can achieve more stable accuracy and better anti-noise ability under fewer labeled samples.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call