Abstract

Real-world data exhibits a long-tailed label distribution, which leads to classification bias. Popular re-sampling or re-weighting methods usually require known category information. However, learning from long-tailed data with open categories is a challenging issue. In this paper, we propose an active distribution optimization algorithm (DALC) to handle the interesting issue. Through clustering, querying and classification iterations, we explore new categories and balance label distribution. For clustering, we present an exploration technique that adaptively obtains optimal data distribution with minimal total distance/cost. For each query, we design a critical instance selection strategy with the cluster information. For classification, we establish an ensemble model to continuously balance the label distribution. We conducted experiments on synthetic, benchmark and domain datasets. The results of the significance test verified the effectiveness of DALC and its superiority over state-of-the-art long-tailed data classification and open set classification algorithms.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call