Active Learning Query Strategies Research Articles

Supervised machine learning tasks often require a large number of labeled training data to set up a model, and then prediction - for example the classification - is carried out based on this model. Nowadays tremendous amount of data is available on the web or in data warehouses, although only a portion of those data is annotated and the labeling process can be tedious, expensive and time consuming. Active learning tries to overcome this problem by reducing the labeling cost through allowing the learning system to iteratively select the data from which it learns. In special case of active learning, the process starts from zero initialized scenario, where the labeled training dataset is empty, and therefore only unsupervised methods can be performed. In this paper a novel query strategy framework is presented for this problem, called Clustering Based Balanced Sampling Framework (CBBSF), which is not only select the initial labeled training dataset, but uniformly selects the items among the categories to get a balanced labeled training dataset. The framework includes an assignment technique to implicitly determine the class membership probabilities. Assignment solution is updated during CBBSF iterations, hence it simulates supervised machine learning more accurately as the process progresses. The proposed Spectral Clustering Based Sampling (SCBS) query startegy realizes the CBBSF framework, and therefore it is applicable in the special zero initialized situation. This selection approach uses ClusterGAN (Clustering using Generative Adversarial Networks) integrated in the spectral clustering algorithm and then it selects an unlabeled instance depending on the class membership probabilities. Global and local versions of SCBS were developed, furthermore, most confident and minimal entropy measures were calculated, thus four different SCBS variants were examined in total. Experimental evaluation was conducted on the MNIST dataset, and the results showed that SCBS outperforms the state-of-the-art zero initialized active learning query strategies.

Read full abstract

Active learning is the category of partially supervised algorithms that is differentiated by its strategy to combine both the predictive ability of a base learner and the human knowledge so as to exploit adequately the existence of unlabeled data. Its ambition is to compose powerful learning algorithms which otherwise would be based only on insufficient labelled samples. Since the latter kind of information could raise important monetization costs and time obstacles, the human contribution should be seriously restricted compared with the former. For this reason, we investigate the use of the Logitboost wrapper classifier, a popular variant of ensemble algorithms which adopts the technique of boosting along with a regression base learner based on Model trees into 3 different active learning query strategies. We study its efficiency against 10 separate learners under a well-described active learning framework over 91 datasets which have been split to binary and multi-class problems. We also included one typical Logitboost variant with a separate internal regressor for discriminating the benefits of adopting a more accurate regression tree than one-node trees, while we examined the efficacy of one hyperparameter of the proposed algorithm. Since the application of the boosting technique may provide overall less biased predictions, we assume that the proposed algorithm, named as Logitboost(M5P), could provide both accurate and robust decisions under active learning scenarios that would be beneficial on real-life weakly supervised classification tasks. Its smoother weighting stage over the misclassified cases during training as well as the accurate behavior of M5P are the main factors that lead towards this performance. Proper statistical comparisons over the metric of classification accuracy verify our assumptions, while adoption of M5P instead of weak decision trees was proven to be more competitive for the majority of the examined problems. We present our results through appropriate summarization approaches and explanatory visualizations, commenting our results per case.

Read full abstract

Active Learning Query Strategies Research Articles

Related Topics

Articles published on Active Learning Query Strategies

Retracted: Active Learning Query Strategies for Linear Regression Based on Efficient Global Optimization

Onception: Active Learning with Expert Advice for Real World Machine Translation

Toward Label-Efficient Neural Network Training: Diversity-Based Sampling in Semi-Supervised Active Learning

A framework to build accurate Convolutional Neural Network models for melanoma diagnosis

On the application of active learning for efficient and effective IoT botnet detection

Active Learning Query Strategies for Linear Regression Based on Efficient Global Optimization

Zero Initialized Active Learning with Spectral Clustering using Hungarian Method

A Comparative Analysis of Active Learning for Biomedical Text Mining

Investigation of Combining Logitboost(M5P) under Active Learning Classification Tasks

Multi-category Classification Problem Oriented Subsampling-Based Active Learning Method

SEAL: Semisupervised Adversarial Active Learning on Attributed Graphs.

Active Learning Query Strategies for Classification, Regression, and Clustering: A Survey

Overlap Aware Active Learning Query Strategies for Pool Based Scenario

On active learning methods for manifold data

Active learning reduces annotation time for clinical concept extraction

Low-Resource Active Learning of Morphological Segmentation

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Active Learning Query Strategies Research Articles

Related Topics

Articles published on Active Learning Query Strategies

Retracted: Active Learning Query Strategies for Linear Regression Based on Efficient Global Optimization

Onception: Active Learning with Expert Advice for Real World Machine Translation

Toward Label-Efficient Neural Network Training: Diversity-Based Sampling in Semi-Supervised Active Learning

A framework to build accurate Convolutional Neural Network models for melanoma diagnosis

On the application of active learning for efficient and effective IoT botnet detection

Active Learning Query Strategies for Linear Regression Based on Efficient Global Optimization

Zero Initialized Active Learning with Spectral Clustering using Hungarian Method

A Comparative Analysis of Active Learning for Biomedical Text Mining

Investigation of Combining Logitboost(M5P) under Active Learning Classification Tasks

Multi-category Classification Problem Oriented Subsampling-Based Active Learning Method

SEAL: Semisupervised Adversarial Active Learning on Attributed Graphs.

Active Learning Query Strategies for Classification, Regression, and Clustering: A Survey

Overlap Aware Active Learning Query Strategies for Pool Based Scenario

On active learning methods for manifold data

Active learning reduces annotation time for clinical concept extraction

Low-Resource Active Learning of Morphological Segmentation