Abstract

The study is devoted to the problem of label quality assurance and labeling cost minimization. It is important to have high quality labels in order to build efficient supervised machine learning pipelines. The aforementioned labels have proved to be costly to acquire; however, one can work with weaker instances of supervision that may be helpful to create a reliably set of ground truth labels for classifiers. The authors present a semi-supervised approach using pretrained language models to increase supervision information quality from general-purpose as well domain-specific document sets where weak labels and unlabeled data is present.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call