Abstract

The medical domain has always been an all-time important domain since healthiness is everyone’s purpose. People find medical document resources in the sea of data and information, such as the web. To support information retrieval and knowledge dissemination through the web, we analyze the use of semi-supervised learning to classify medical-related documents. The semi-supervised learning technique is chosen to show the possibilities of creating good classifiers with limited human supervision. In this research, we use the Naïve Bayes and Pseudo Labeling technique. We analyze different labeled:unlabeled data ratios of the training dataset in the experiment, starting from 4:3, 3:4, 2:5, and 1:6, to see the semi-supervised learning performance with different levels of human supervision. We get a relatively similar result in terms of classification average accuracy (81%-83%). Interestingly, in one experiment, the highest accuracy of the 1:6 ratio (85%) outperforms the 2:5 ratio (82%) and has the same accuracy as the 4:3 (85%). However, the standard deviation of the accuracy in the 1:6 ratio is the highest, amongst others (4.183). Finally, semi-supervised learning can be used to create a great classifier model of the medical domain in Bahasa Indonesia with less human supervision.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call