Few-shot Partial Multi-label Learning with Data Augmentation

Yifan Sun,Carlotta Domeniconi,Yunfeng Zhao,Zhongmin Yan,Guoxian Yu

doi:10.1109/icdm54844.2022.00058

Abstract

Partial multi-label learning (PML) models the scenario where each training sample is annotated with a set of candidate labels, but only a subset of them corresponds to the ground-truths. The key challenge for PML is how to minimize the negative impact of incorrect labels concealed within the candidate ones. Most existing PML solutions require abundant samples to train a noise-robust multi-label predictor. However, due to privacy, safety or ethic issues, we more often have a handful of training samples for the target task. In this paper, we propose an approach named FsPML-DA (Few-shot Partial Multi-Label Learning with Data Augmentation) to simultaneously estimate label confidence, perform data augmentation and induce multilabel classifier. Specifically, FsPML-DA disambiguates the label confidence vector of each PML sample by jointly modeling the feature and semantic similarity, label credibility of other samples and label co-occurrence. Next, FsPML-DA introduces a synthetic feature network to generate more training samples from pairs of given samples with label confidence values. FsPML-DA then leverages original and generated samples to train a noise-tolerant multi-label classifier. Extensive experiments on benchmark datasets show that FsPML-DA performs better than recent competitive PML baselines and few-shot solutions. FsPML-DA can dislodge noisy labels by mining PML data in a sensible way and the proposed data augmentation strategy effectively combats with the scarcity of few-shot training samples.

Full Text