Abstract
Stream-based active learning (AL) strategies minimize the labeling effort by querying labels that improve the classifier’s performance the most. So far, these strategies neglect the fact that an oracle or expert requires time to provide a queried label. We show that existing AL methods deteriorate or even fail under the influence of such verification latency. The problem with these methods is that they estimate a label’s utility on the currently available labeled data. However, when this label would arrive, some of the current data may have gotten outdated and new labels have arrived. In this article, we propose to simulate the available data at the time when the label would arrive. Therefore, our method Forgetting and Simulating (FS) forgets outdated information and simulates the delayed labels to get more realistic utility estimates. We assume to know the label’s arrival date a priori and the classifier’s training data to be bounded by a sliding window. Our extensive experiments show that FS improves stream-based AL strategies in settings with both, constant and variable verification latency.
Highlights
This article addresses data stream classification in non-stationary environments, where instances appear successively and are initially unlabeled.To learn a classifier, we need labels for at least some of these instances
The plots show that Random Selection (Rand) performs the best with high latency compared to the active learning (AL) strategies
The results suggest, that the performance of traditional AL strategies decreases when compared to selecting instances randomly
Summary
This article addresses data stream classification in non-stationary environments, where instances appear successively and are initially unlabeled. We need labels for at least some of these instances. We can select some instances to be passed to an oracle for labeling, e.g., a human expert or a computationally intensive simulation. Such a label acquisition induces some sort of cost, which we (for ) assume to be equal across all instances. Algorithms from the field of stream-based active learning (AL) aim to maximize the classifier’s performance under the given budget restrictions by selecting only the most informative instances for labeling. Similar to Zliobaite et al (2014), we follow the common assumption that the AL strategy needs to decide immediately at the moment an instance arrives whether or not to acquire its label
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.