Rare category exploration with noisy labels

Haiqin Weng,Kevin Chiew,Zhenguang Liu,Qinming He,Roger Zimmermann

doi:10.1016/j.eswa.2018.07.050

Abstract

Starting from a few labelled data examples as the seeds, rare category exploration (RCE) aims to find out the target rare category hidden in the given dataset. However, the performance of conventional RCE approaches is very sensitive to noisy labels while the presence of noises in manually generated labels is almost inevitable. To address this deficiency of traditional RCE approaches, this paper investigates the RCE process in the presence of noisy labels, which to the best of our knowledge has not yet been intensively studied by previous research. Based on the assumption that only one labelled data example of the rare category is correctly labelled while the other few data examples may be wrongly labelled, we first propose a label propagation based algorithm SLP to extract the coarse shape of a rare category. Then, we refine the result by proposing a mixture-information based propagation model, RLP. Extensive experiments have been conducted on six real-world datasets, which show that our method outperforms the state-of-the-art RCE approaches. We also show that even with 20% noisy labels, our method is able to achieve a satisfactory accuracy.

Full Text