Abstract

Stream-based active learning (AL) strategies minimize the labeling effort by querying labels that improve the classifier’s performance the most. So far, these strategies neglect the fact that an oracle or expert requires time to provide a queried label. We show that existing AL methods deteriorate or even fail under the influence of such verification latency. The problem with these methods is that they estimate a label’s utility on the currently available labeled data. However, when this label would arrive, some of the current data may have gotten outdated and new labels have arrived. In this article, we propose to simulate the available data at the time when the label would arrive. Therefore, our method Forgetting and Simulating (FS) forgets outdated information and simulates the delayed labels to get more realistic utility estimates. We assume to know the label’s arrival date a priori and the classifier’s training data to be bounded by a sliding window. Our extensive experiments show that FS improves stream-based AL strategies in settings with both, constant and variable verification latency.

Highlights

  • This article addresses data stream classification in non-stationary environments, where instances appear successively and are initially unlabeled.To learn a classifier, we need labels for at least some of these instances

  • The plots show that Random Selection (Rand) performs the best with high latency compared to the active learning (AL) strategies

  • The results suggest, that the performance of traditional AL strategies decreases when compared to selecting instances randomly

Read more

Summary

Introduction

This article addresses data stream classification in non-stationary environments, where instances appear successively and are initially unlabeled. We need labels for at least some of these instances. We can select some instances to be passed to an oracle for labeling, e.g., a human expert or a computationally intensive simulation. Such a label acquisition induces some sort of cost, which we (for ) assume to be equal across all instances. Algorithms from the field of stream-based active learning (AL) aim to maximize the classifier’s performance under the given budget restrictions by selecting only the most informative instances for labeling. Similar to Zliobaite et al (2014), we follow the common assumption that the AL strategy needs to decide immediately at the moment an instance arrives whether or not to acquire its label

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.