Hierarchical classification of data with long-tailed distributions via global and local granulation

Hong Zhao,Shunxin Guo,Yaojin Lin

doi:10.1016/j.ins.2021.09.059

Abstract

Automated learning from datasets with a long-tailed distribution has gradually become a research hotspot due to the increasing complexity of large-scale real-world datasets. Existing solutions to long-tailed data classification usually involve re-balancing strategies for global optimization, which can achieve satisfactory results. However, re-balancing strategies tend to alter the original data. In this paper, we propose a knowledge granulation method based on global and local granulation to assist the hierarchical classification of long-tailed data without altering the original data. Firstly, a global classifier is constructed based on the WordNet knowledge organization’s hierarchical structure, which is used to granulate the global data from coarse to fine. Secondly, a local hierarchical classifier adapted to tail data is constructed for tail classes that contain few samples. The hierarchical structure of this local classifier is obtained by granulating the data via spectral clustering rather than by using the semantic hierarchy of classes. Finally, the global classifier is used to preliminarily classify samples, then uncertain samples are further classified by the tail local classifier. Experimental results show that the proposed method outperforms several state-of-the-art models designed for the hierarchical classification of long-tailed data.

Full Text