Active learning is an important class of machine learning where labels are queried when necessary. Most active learning algorithms need to iteratively retrain the classifier when new labeled data are obtained. Such a batch learning process can incur a high overhead in both time and memory. In this paper, we propose a new online active learning algorithm for binary classification. Our algorithm uses the margin-based criterion, which compares the margin of instances with a threshold to decide whether it should be queried. Especially, we propose Iteratively Decreased Threshold (IDT), a new threshold update method for the margin-based criterion. By iteratively decreasing the threshold with IDT, our algorithm can effectively reduce the number of queried instances. In addition, as evaluating the margin-based criterion involves only simple inner productions, our algorithm is also very efficient to evaluate. We compare our algorithm with other state-of-the-art online active learning algorithms on six data sets, demonstrating that it requires less queries to achieve the same classification accuracy, and incurs a smaller computation overhead at the same time.
Read full abstract