DEVELOPMENT AND COMPARATIVE ANALYSIS OF SEMI-SUPERVISED LEARNING ALGORITHMS ON A SMALL AMOUNT OF LABELED DATA

Klym Yamkovyi

doi:10.20998/2079-0023.2021.01.16

Abstract

The paper is dedicated to the development and comparative experimental analysis of semi-supervised learning approaches based on a mix of unsupervised and supervised approaches for the classification of datasets with a small amount of labeled data, namely, identifying to which of a set of categories a new observation belongs using a training set of data containing observations whose category membership is known. Semi-supervised learning is an approach to machine learning that combines a small amount of labeled data with a large amount of unlabeled data during training. Unlabeled data, when used in combination with a small quantity of labeled data, can produce significant improvement in learning accuracy. The goal is semi-supervised methods development and analysis along with comparing their accuracy and robustness on different synthetics datasets. The proposed approach is based on the unsupervised K-medoids methods, also known as the Partitioning Around Medoid algorithm, however, unlike Kmedoids the proposed algorithm first calculates medoids using only labeled data and next process unlabeled classes – assign labels of nearest medoid. Another proposed approach is the mix of the supervised method of K-nearest neighbor and unsupervised K-Means. Thus, the proposed learning algorithm uses information about both the nearest points and classes centers of mass. The methods have been implemented using Python programming language and experimentally investigated for solving classification problems using datasets with different distribution and spatial characteristics. Datasets were generated using the scikit-learn library. Was compared the developed approaches to find average accuracy on all these datasets. It was shown, that even small amounts of labeled data allow us to use semi-supervised learning, and proposed modifications ensure to improve accuracy and algorithm performance, which was demonstrated during experiments. And with the increase of available label information accuracy of the algorithms grows up. Thus, the developed algorithms are using a distance metric that considers available label information. Keywords: Unsupervised learning, supervised learning. semi-supervised learning, clustering, distance, distance function, nearest neighbor, medoid, center of mass.

Highlights

A large amount of data was produced recently, and nowadays humanity had the opportunity to store and process all this data
The most efficient approach in machine learning is supervised learning, when we have some data with labels and try to learn a function on data points as label pairs
Unlike K-medoids the proposed algorithm first calculates medoids using only labeled data and process unlabeled classes – assign labels of nearest medoid. This algorithm has the following pros: – reduced processing time, because required only multiple iteration throw points unlike standard K-medoid; – more robustness to wrong assigned labels, because the algorithm gives higher weights to labeled data in the medoids calculation step. Another proposed approach uses the idea of K-nearest neighbors and K-Mean algorithm, because for classifying we use both information about the nearest points and classes centers of mass

Summary

Introduction

A large amount of data was produced recently, and nowadays humanity had the opportunity to store and process all this data. This algorithm has the following pros: – reduced processing time, because required only multiple iteration throw points unlike standard K-medoid; – more robustness to wrong assigned labels, because the algorithm gives higher weights to labeled data in the medoids calculation step Another proposed approach uses the idea of K-nearest neighbors and K-Mean algorithm, because for classifying we use both information about the nearest points and classes centers of mass (algorithm 1). The K-nearest neighbors based approach has better accuracy in case of closely located clusters with the same distribution Another required feature of a semi-supervised algorithm is quality versus a number of labels dependency: more labels – higher quality and vice versa.

Moons Aniso Varied accuracy

Conclusions

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

DEVELOPMENT AND COMPARATIVE ANALYSIS OF SEMI-SUPERVISED LEARNING ALGORITHMS ON A SMALL AMOUNT OF LABELED DATA

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Bulletin of National Technical University "KhPI". Series: System Analysis, Control and Information Technologies

Lead the way for us

Journal: Bulletin of National Technical University "KhPI". Series: System Analysis, Control and Information Technologies	Publication Date: Jul 12, 2021
License type: cc-by

Similar Papers

ADAPTATION OF LAMBDAMART MODEL TO SEMI-SUPERVISED LEARNING
Klym Yamkovyi
Bulletin of National Technical University "KhPI". Series: System Analysis, Control and Information Technologies | VOL. -
Klym YamkovyiKlym Yamkovyi
15 Jul 2023
Bulletin of National Technical University "KhPI". Series: System Analysis, Control and Information Technologies | VOL. -

Threshold Filtering Semi-Supervised Learning Method for SAR Target Recognition
Linshan Shen ... Zhuofei Wu
Computers, Materials & Continua | VOL. 73
Linshan Shen, et. al.Linshan Shen ... Zhuofei Wu
01 Jan 2021
Computers, Materials & Continua | VOL. 73

MUSCLE: Strengthening Semi-Supervised Learning Via Concurrent Unsupervised Learning Using Mutual Information Maximization
Hanchen Xie ... Aram Galstyan
-
Hanchen Xie, et. al.Hanchen Xie ... Aram Galstyan
01 Jan 2020
01 Jan 2020

A Novel Two-Stage Unsupervised Fault Recognition Framework Combining Feature Extraction and Fuzzy Clustering for Collaborative AIoT
Xufeng Hu ... Yibin Li
IEEE Transactions on Industrial Informatics | VOL. 18
Xufeng Hu, et. al.Xufeng Hu ... Yibin Li
29 Apr 2021
IEEE Transactions on Industrial Informatics | VOL. 18

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

DEVELOPMENT AND COMPARATIVE ANALYSIS OF SEMI-SUPERVISED LEARNING ALGORITHMS ON A SMALL AMOUNT OF LABELED DATA

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Bulletin of National Technical University "KhPI". Series: System Analysis, Control and Information Technologies