Assessing the Suitability of Semi-Supervised Learning Datasets using Item Response Theory

Teodor Fredriksson,David Issa Mattos,Helena Holmstrom Olsson,Jan Bosch

doi:10.1109/seaa53835.2021.00049

Abstract

In practice, supervised learning algorithms require fully labeled datasets to achieve the high accuracy demanded by current modern applications. However, in industrial settings supervised learning algorithms can perform poorly because of few labeled instances. Semi-supervised learning (SSL) is an automatic labeling approach that utilizes complete labels to infer missing labels in partially complete datasets. The high number of available SSL algorithms and the lack of systematic comparison between them leaves practitioners without guidelines to select the appropriate one for their application. Moreover, each SSL algorithm is often validated and evaluated in a small number of common datasets. However, there is no research that examines what datasets are suitable for comparing different SSL algorihtms. The purpose of this paper is to empirically evaluate the suitability of the datasets commonly used to evaluate and compare different SSL algorithms. We performed a simulation study using twelve datasets of three different datatypes (numerical, text, image) on thirteen different SSL algorithms. The contributions of this paper are two-fold. First, we propose the use of Bayesian congeneric item response theory model to assess the suitability of commonly used datasets. Second, we compare the different SSL algorithms using these datasets. The results show that with except of three datasets, the others have very low discrimination factors and are easily solved by the current algorithms. Additionally, the SSL algorithms have overlapping 90% credible intervals, indicating uncertainty in the difference between the accuracy of these SSL models. The paper concludes suggesting that researchers and practitioners should better consider the choice of datasets used for comparing SSL algorithms.

Full Text