Abstract

The medical domain has always been an all-time important domain since healthiness is everyone’s purpose. People find medical document resources in the sea of data and information, such as the web. To support information retrieval and knowledge dissemination through the web, we analyze the use of semi-supervised learning to classify medical-related documents. The semi-supervised learning technique is chosen to show the possibilities of creating good classifiers with limited human supervision. In this research, we use the Naïve Bayes and Pseudo Labeling technique. We analyze different labeled:unlabeled data ratios of the training dataset in the experiment, starting from 4:3, 3:4, 2:5, and 1:6, to see the semi-supervised learning performance with different levels of human supervision. We get a relatively similar result in terms of classification average accuracy (81%-83%). Interestingly, in one experiment, the highest accuracy of the 1:6 ratio (85%) outperforms the 2:5 ratio (82%) and has the same accuracy as the 4:3 (85%). However, the standard deviation of the accuracy in the 1:6 ratio is the highest, amongst others (4.183). Finally, semi-supervised learning can be used to create a great classifier model of the medical domain in Bahasa Indonesia with less human supervision.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.