Abstract

We propose a novel unified frameork for automated distributed active learning (AutoDAL) to address multiple challenging problems in active learning such as limited labeled data, imbalanced datasets, automatic hyperparameter selection as well as scalability to big data. First, automated graph-based semi-supervised learning is conducted by aggregating the proposed cost functions from different compute nodes and jointly optimizing hyperparameters in both the classification and query selection stages. For dense datasets, clustering-based uncertainty sampling with maximum entropy (CME) loss is applied in the optimization. For sparse and imbalanced datasets, shrinkage optimized KL-divergence regularization and local selection based active learning (SOAR) loss are further naturally adapted in AutoDAL. The optimization is efficiently resolved by iteratively executing a genetic algorithm (GA) refined with a local generating set search (GSS) and solving an integer linear programming (ILP) problem. Moreover, we propose an efficient distributed active learning algorithm which is scalable for big data. The proposed AutoDAL algorithm is applied to multiple benchmark datasets and two real-world datasets including an electrocardiogram (ECG) dataset and a credit fraud detection dataset for classification. We demonstrate that the proposed AutoDAL algorithm is capable of achieving significantly better performance compared to several state-of-the-art AutoML approaches and active learning algorithms.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call