Abstract

The existing fuzzy rough set (FRS) models all believe that the decision attribute divides the sample set into several “clear” decision classes, and this data processing method makes the model sensitive to noise information when conducting feature selection. To solve this problem, this paper proposes a robust fuzzy rough set model (RS-FRS) based on representative samples. Firstly, the fuzzy membership degree of the samples is defined to reflect its fuzziness and uncertainty, and RS-FRS model is constructed to reduce the influence of the noise samples. RS-FRS model does not need to set parameters for the model in advance and can effectively reduce the complexity of the model and human intervention. On this basis, the related properties of RS-FRS model are studied, and the sample pair selection algorithm (SPS) based on RS-FRS is used for feature selection. In this paper, RS-FRS is tested and analysed on the open 12 datasets. The experimental results show that RS-FRS model proposed can effectively select the most relevant features and has certain robustness to the noise information. The proposed model has a good applicability for data processing and can effectively improve the performance of feature selection.

Highlights

  • In the current era of big data, the data scale is massive, and the presentation is high-dimensional. e high dimension of data representation is mainly due to that data often contains a large number of redundant or irrelevant features, resulting in excessively high data dimension, which seriously reduces the processing capacity and time efficiency of pattern classification as well as the resolution ability of decision making

  • Conclusions e development of robust fuzzy rough set (FRS) is a hot spot in the theory of FRS, which has some advantages in the feature selection of noise information

  • The nonparametric fuzzy membership degree is defined by fuzzy granular calculation, and a FRS model based on representative samples is proposed

Read more

Summary

Introduction

In the current era of big data, the data scale is massive, and the presentation is high-dimensional. e high dimension of data representation is mainly due to that data often contains a large number of redundant or irrelevant features, resulting in excessively high data dimension, which seriously reduces the processing capacity and time efficiency of pattern classification as well as the resolution ability of decision making. E high dimension of data representation is mainly due to that data often contains a large number of redundant or irrelevant features, resulting in excessively high data dimension, which seriously reduces the processing capacity and time efficiency of pattern classification as well as the resolution ability of decision making. High-dimensional data makes the fast, timely, and accurate data mining task face great challenges. Erefore, how to effectively select features for these data has become one of the hot topics in the field of machine learning [1, 2]. FRS theory has been widely concerned and applied in data mining, machine learning, pattern recognition, and other fields [11,12,13]. Based on the upper and lower approximation of the classical FRS model, the nearest sample of the given target sample is used in the calculation, so the classical FRS model constrained by the nearest sample is extremely sensitive to noise

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call