Abstract

Feature selection is one of the core contents of rough set theory and application. Since the reduction ability and classification performance of many feature selection algorithms based on rough set theory and its extensions are not ideal, this paper proposes a feature selection algorithm that combines the information theory view and algebraic view in the neighborhood decision system. First, the neighborhood relationship in the neighborhood rough set model is used to retain the classification information of continuous data, to study some uncertainty measures of neighborhood information entropy. Second, to fully reflect the decision ability and classification performance of the neighborhood system, the neighborhood credibility and neighborhood coverage are defined and introduced into the neighborhood joint entropy. Third, a feature selection algorithm based on neighborhood joint entropy is designed, which improves the disadvantage that most feature selection algorithms only consider information theory definition or algebraic definition. Finally, experiments and statistical analyses on nine data sets prove that the algorithm can effectively select the optimal feature subset, and the selection result can maintain or improve the classification performance of the data set.

Highlights

  • Today, society has entered the era of network information, the rapid development of computer and network information technology that makes data and information in various fields increase rapidly

  • In order to effectively analyze the uncertainty of knowledge in the neighborhood rough set, the credibility and coverage are introduced into the neighborhood decision system, and the neighborhood credibility and neighborhood coverage are defined and introduced into neighborhood joint entropy

  • Classification Results of Bonje Algorithm. This part of the experiment compares the classification accuracy and the number of features between the original data and the feature subset selected by the based on neighborhood joint entropy (BONJE) algorithm

Read more

Summary

Introduction

Society has entered the era of network information, the rapid development of computer and network information technology that makes data and information in various fields increase rapidly. The feature selection algorithm that satisfies the monotonicity has the problem that when the classification performance of the original data set is poor, the measured value of the evaluation function is low, and the final reduction effect is not good [18] To solve this problem, Li et al [19] proposed a non-monotonic feature selection algorithm based on decision rough set model. This paper combines the information theory view and algebra view in the neighborhood decision system, and proposes a heuristic non-monotonic feature selection algorithm. To construct a more comprehensive measurement mechanism and overcome the problem of poor selection results when the classification performance of the original data set is not good, the information theory view and algebraic view in the neighborhood decision system are combined to propose a heuristic non-monotonic feature selection algorithm.

Basic Concepts
Information Entropy Measures
Neighborhood Rough Set
Feature Selection Algorithm Design
Neighborhood Credibility and Neighborhood Coverage
Heuristic Non-Monotonic Feature Selection Algorithm Design
Experimental Data Introduction
Experimental Environment
Neighborhood Radius Selection
Classification Results of Bonje Algorithm
The Performance of BONJE Algorithm on Low-Dimensional Data Sets
The Performance of BONJE Algorithm on High-Dimensional Data Sets
Statistical Analyses
Conclusions
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.