Semi-supervised method for improving general-purpose and domain-specific textual corpora labels

Igor Babikov,Sergey Kovalchuk,Ivan Soldatov

doi:10.1016/j.procs.2023.12.018

Semi-supervised method for improving general-purpose and domain-specific textual corpora labels

Igor Babikov, Sergey Kovalchuk + Show 1 more

Open Access

https://doi.org/10.1016/j.procs.2023.12.018

Copy DOI

Journal: Procedia computer science	Publication Date: Jan 1, 2023
License type: cc-by-nc-nd

#High Quality Labels #Weak Labels + Show 8 more

Abstract
Full-Text PDF
Similar Papers

Abstract

The study is devoted to the problem of label quality assurance and labeling cost minimization. It is important to have high quality labels in order to build efficient supervised machine learning pipelines. The aforementioned labels have proved to be costly to acquire; however, one can work with weaker instances of supervision that may be helpful to create a reliably set of ground truth labels for classifiers. The authors present a semi-supervised approach using pretrained language models to increase supervision information quality from general-purpose as well domain-specific document sets where weak labels and unlabeled data is present.

Full Text