Particle Swarm Optimization Based Swarm Intelligence for Active Learning Improvement: Application on Medical Data Classification

Nawel Zemmal,Nabiha Azizi,Amel Ziani,Monther Aldwairi,Mokhtar Sellami,Nadjette Dendani,Soraya Cheriguene

doi:10.1007/s12559-020-09739-z

Abstract

Semi-supervised learning targets the common situation where labeled data are scarce but unlabeled data are abundant. It uses unlabeled data to help supervised learning tasks. In practice, it may make sense to utilize active learning in conjunction with semi-supervised learning. That is, we might allow the learning algorithm to pick a set of unlabeled instances to be labeled by a domain expert, which will then be used as the labeled data set. However, existing approaches are computationally expensive and require searching through an entire unlabeled dataset, which may contain redundant instances that provide no instructive information to the classifier and can decrease the performance. To address this optimization problem, a hybrid system that combines active learning (AL) and particle swarm optimization (PSO) algorithms is proposed to reduce the cost of labeling while building a more efficient classifier. The novelty of this work resides in the integration of a bio-inspired optimization algorithm in the machine learning strategy. Furthermore, a novel uncertainty measure was integrated into the particle swarm optimization algorithm as an objective function to select from massive amounts of medical instances those that are deemed most informative. To evaluate the effectiveness of the proposed approach, eighteen (18) benchmark datasets were used and compared against three best-known classifiers with different learning paradigms: AL–NB an active learning algorithm using Naive Base classifier and Margin Sampling strategy, SVM (Support Vector Machine), ELM (Extreme Learning Machine) with supervised learning, and TSVM (Transductive Support Vector Machine) with the semi-supervised learning. Experiments showed that the proposed approach is effective in reducing the efforts required by experts for medical data annotation to produce an accurate classifier. The active learning approach has been utilized to optimize the expensive task of labeling. Based on a novel uncertainty measure, the nature-inspired algorithm PSO attempts to select from massive amounts of unlabeled medical instances those considered informative, at the same time improving the classifier performance. The experiments carried out confirm that the proposed strategy significantly enhances the performance of the AL algorithm compared with the commonly used uncertainty strategies. It achieves a performance similar to that of fully supervised and semi-supervised algorithms while requiring much less labeling. As a future extension of this work, it would be interesting to integrate other evolutionary optimization algorithms and compare them with our approach. In addition, it is beneficial to test the impact of using other variants of PSO algorithm in our approach. Also, it is aimed to test more classification algorithms in the experimentation process.

Full Text