Feature Selection for Unbalanced Distribution Hybrid Data Based on ${k}$-Nearest Neighborhood Rough Set

Weihua Xu,Zheng Liu,Ziting Yuan

doi:10.1109/tai.2023.3237203

Abstract

Neighborhood rough sets are now widely used to process numerical data. Nevertheless, most of the existing neighborhood rough sets are not able to distinguish class mixture samples well when dealing with classification problems. That is, it cannot effectively classify categories when dealing with data with an unbalanced distribution. Because of this, in this aiticle, we propose a new feature selection method that takes into consideration both heterogeneous data and feature interaction. The proposed model well integrates the ascendancy of δ-neighborhood and <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">k</i> -nearest neighbor. Such heterogeneous data can be handled better than existing neighborhood models. We utilize information entropy theories such as mutual information and conditional mutual information, and employ an iterative strategy to define the importance of each feature in decision-making. Furthermore, we design a feature extraction algorithm based on above idea. Experimental results displays that the raised algorithm has superior effect than some existing algorithms, particularly the δ-neighborhood rough set model and <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">k</i> -nearest neighborhood rough set model.

Full Text