Feature selection is a significant preprocessing technique for data mining, which can promote the accuracy of data classification and shrink feature space by eliminating redundant features. Since traditional feature selection algorithms have high time complexity and low classification accuracy, an effective algorithm using Information Gain and decision information is designed. The algorithm introduces Information Gain for performing preliminary dimensionality reduction on high dimensional datasets, and then the decision information is regarded as an evaluation function of features to select features with important information. First, the concept of joint information granule is defined, and neighborhood information entropy measures are proposed based on the joint information granule. In addition, the relationship between these measures is studied, which is helpful to study the uncertainty in data. Second, a nonmonotonic algorithm using the decision information in the neighborhood information entropy measures is proposed to overcome the shortcoming of algorithms based on monotonic evaluation functions, thereby improving the accuracy of data classification. Third, to reduce the time cost of the designed algorithm for high dimensional datasets, Information Gain is introduced to preliminarily eliminate irrelevant features in high dimensional datasets. Finally, the ablation and comparison experiments on twelve public datasets demonstrate the low time cost and high classification accuracy of our algorithm, respectively.
Read full abstract