Label distribution learning is a widely studied supervised learning diagram that can handle the problem of label ambiguity. The increasing size of datasets is accompanied by the disaster of dimensionality, which implies that the arrival of redundant and noisy features undermines the effect of label distribution learning. As a crucial data-preprocessing technique, feature selection is capable of choosing discriminative features. However, due to the complex issue of label ambiguity, traditional feature selection approaches for datasets with logical labels cannot be applied to label distribution data. In this paper, a novel granular ball computing-based fuzzy rough set (GBFRS) is proposed for label distribution feature selection. Specifically, the proposed method is first introduced at the finest granularity, i.e., calculating similarity relations between single data points. Considering that the label ambiguity issue is exacerbated by the label imbalance phenomenon, the relative similarity in label distribution space among samples is computed for better generalization of the model. Then, a robust approximation strategy is devised for the target sample by using its true different and partially different class samples. Finally, with the concept of granular balls, the method explores the similarity relations between balls and samples, and the granular ball computing-based fuzzy rough set method is developed , which is endowed with the ability to simulate the characteristics of large-scale priorities in human thinking and considers local consistency. Extensive experiments conducted on twenty-two datasets show that GBFRS can effectively select more significant features than seven state-of-the-art feature selection algorithms.
Read full abstract