Abstract

Traditional active learning methods have achieved gratifying results in the classification tasks of less categories such as binary classification, the application research of active learning in the field of big data problems still faces enormous challenges. Since many active learning query strategies need to perform matrix inversion, the amount of calculation increases exponentially with the increase of the scale of the problem, it is difficult to apply these active learning methods in large scale multi-category data classification task. In order to solve this problem, this paper designed a subsampling-based active learning model, and integrate unsupervised clustering algorithm with traditional active learning method, then conducted experiments on Binary Alphadigits and OMNIGLOT data sets. This paper compares the performance of five traditional active learning algorithms using this subsampling method, namely random sampling, uncertainty sampling, query-by-committee, density weighting and learning-based active learning. Through comparative experiments, the feasibility of active learning based on subsampling for solving the multi-category classification problem is verified, and it is found that the subsampling-based method can break the limitations of traditional active learning methods that cannot deal with large-scale data classification.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call