Fuzzy rough sets (FRSs) are considered to be a powerful model for analyzing uncertainty in data. This model encapsulates two types of uncertainty: 1) fuzziness coming from the vagueness in human concept formation and 2) roughness rooted in the granulation coming with human cognition. The rough set theory has been widely applied to feature selection, attribute reduction, and classification. However, it is reported that the classical FRS model is sensitive to noisy information. To address this problem, several robust models have been developed in recent years. Nevertheless, these models do not consider a statistical distribution of data, which is an important type of uncertainty. Data distribution serves as crucial information for designing an optimal classification or regression model. Thus, we propose a data-distribution-aware FRS model that considers distribution information and incorporates it in computing lower and upper fuzzy approximations. The proposed model considers not only the similarity between samples, but also the probability density of classes. In order to demonstrate the effectiveness of the proposed model, we design a new sample evaluation index for prototype-based classification based on the model, and a prototype selection algorithm is developed using this index. Furthermore, a robust classification algorithm is constructed with prototype covering and nearest neighbor classification. Experimental results confirm the robustness and effectiveness of the proposed model.
Read full abstract