Abstract

For continuous numerical data sets, neighborhood rough sets-based attribute reduction is an important step for improving classification performance. However, most of the traditional reduction algorithms can only handle finite sets, and yield low accuracy and high cardinality. In this paper, a novel attribute reduction method using Lebesgue and entropy measures in neighborhood rough sets is proposed, which has the ability of dealing with continuous numerical data whilst maintaining the original classification information. First, Fisher score method is employed to eliminate irrelevant attributes to significantly reduce computation complexity for high-dimensional data sets. Then, Lebesgue measure is introduced into neighborhood rough sets to investigate uncertainty measure. In order to analyze the uncertainty and noisy of neighborhood decision systems well, based on Lebesgue and entropy measures, some neighborhood entropy-based uncertainty measures are presented, and by combining algebra view with information view in neighborhood rough sets, a neighborhood roughness joint entropy is developed in neighborhood decision systems. Moreover, some of their properties are derived and the relationships are established, which help to understand the essence of knowledge and the uncertainty of neighborhood decision systems. Finally, a heuristic attribute reduction algorithm is designed to improve the classification performance of large-scale complex data. The experimental results under an instance and several public data sets show that the proposed method is very effective for selecting the most relevant attributes with high classification accuracy.

Highlights

  • Over the past few decades, data classification has become one of the important aspects of data mining, machine learning, pattern recognition, etc

  • The objective of an attribute reduction method usually includes two aspects: one is to select a small number of attributes and the other is to maintain high classification accuracy

  • To verify the classification performances of our proposed attribute reduction method described in Subsection 3.4, the comprehensive results of all contrasted algorithms can be achieved and analyzed on nine public data sets

Read more

Summary

Introduction

Over the past few decades, data classification has become one of the important aspects of data mining, machine learning, pattern recognition, etc. As an important application of rough set models in a variety of practical problems, attribute reduction methods in information systems have been drawing wide attention of researchers [1,2]. It is a fundamental research theme in the field of granular computing [3]. Attribute reduction in rough set theory has been recognized as an important feature selection method [2]. Considering whether the evaluation criterion involves classification models, the existing feature selection methods can be broadly classified into the following three categories [5]: filter, Entropy 2019, 21, 138; doi:10.3390/e21020138 www.mdpi.com/journal/entropy

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call