ReCom: A deep reinforcement learning approach for semi-supervised tabular data labeling

Guy Zaks,Gilad Katz

doi:10.1016/j.ins.2021.12.076

Abstract

One of the main obstacles in applying machine learning to a new domain is the limited availability of labeled data. A common approach for overcoming this challenge is using semi-supervised learning, where labeled and unlabeled data are used together to label additional samples. One of the most common automatic labeling approaches is co-training, which trains two learners on different views of the data, and then proceeds to collaboratively and iteratively label additional samples. Despite their effectiveness in multiple domains, existing co-training approaches for tabular data are either heuristic, and therefore error-prone, or use a greedy approach that leads to sub-optimal performance. We present ReCom, a deep reinforcement learning-based co-training approach. Our approach models multiple aspects of both the dataset and the two learners and develops advanced labeling strategies that achieve state-of-the-art performance. ReCom overcomes the challenge of limited data availability by simultaneously training on multiple datasets, thus producing a generic and robust labeling policy that can be applied to new datasets without the need for any additional training. Our experiments, conducted on a diverse group of 32 datasets, demonstrate the merits of our approach.

Full Text