Abstract
Feature selection is a significant preprocessing technique for data mining, which can promote the accuracy of data classification and shrink feature space by eliminating redundant features. Since traditional feature selection algorithms have high time complexity and low classification accuracy, an effective algorithm using Information Gain and decision information is designed. The algorithm introduces Information Gain for performing preliminary dimensionality reduction on high dimensional datasets, and then the decision information is regarded as an evaluation function of features to select features with important information. First, the concept of joint information granule is defined, and neighborhood information entropy measures are proposed based on the joint information granule. In addition, the relationship between these measures is studied, which is helpful to study the uncertainty in data. Second, a nonmonotonic algorithm using the decision information in the neighborhood information entropy measures is proposed to overcome the shortcoming of algorithms based on monotonic evaluation functions, thereby improving the accuracy of data classification. Third, to reduce the time cost of the designed algorithm for high dimensional datasets, Information Gain is introduced to preliminarily eliminate irrelevant features in high dimensional datasets. Finally, the ablation and comparison experiments on twelve public datasets demonstrate the low time cost and high classification accuracy of our algorithm, respectively.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.