Data-Driven Interval Granulation Approach Based on Uncertainty Principle for Efficient Classification

Chengying Wu,Qin Xie,Guoyin Wang,Nanfang Luo,Qinghua Zhang,Longjun Yin

doi:10.1109/tfuzz.2023.3287834

Abstract

Granular computing (GrC) is an efficient way to reveal descriptions of data in line with human cognition and plays a critical role in knowledge discovery. Information granules (IGs), the basic computing unit of GrC, are key components of knowledge representation and processing. Rough sets are one of the classical GrC models and generate IGs based on indiscernibility relations. The relations can effectively achieve the granulation of nominal attributes and generate desirable IGs, but they may cause information loss when achieving the granulation of numerical attributes. To overcome this issue, fuzzy rough sets (FRS) and neighborhood rough sets (NRS) were developed based on rough sets. However, to generate high-performance IGs in practice, the FRS model requires prior knowledge to determine a fuzzy operation in advance, and NRS needs to calculate an optimal neighborhood radius. In addition, regardless of FRS or NRS, each object is taken as a computing unit to generate IGs that constitute a covering rather than a partition for the universe. This process is not only time-consuming but also prone to generate redundant IGs. Therefore, in this study, a data-driven interval granulation approach based on the uncertainty principle is proposed to generate justifiable interval neighborhood IGs with flexibility and tolerance. First, the interval granulation of attribute values and interval equivalence relation are defined. Next, with the interval equivalence relation, a novel interval rough sets model is developed to unify numerical and nominal attributes into one framework, and a membership function is developed without requiring prior knowledge in advance. Then, a highly effective classifier named CAR-ING integrating attribute reduction technique is developed from the perspective of interval neighborhood IGs. Finally, experiments and comparisons on 17 widely used UCI benchmark datasets and 3 real Biobank medical datasets from the UK demonstrated that CAR-ING performs significantly better than four state-of-the-art classifiers based on GrC and five classical classifiers in machine learning. Additionally, the efficiency of CAR-ING is demonstrated on 20 datasets.

Full Text