Abstract

Feature selection is an important learning task in data mining and knowledge discovery. Nevertheless, the fuzziness, uncertainty, and noise presented by the data greatly complicate the construction of learning models. Moreover, most works focus on exploring low-order correlations between variables using low-dimensional mutual information, without paying attention to high-order interaction for multiple variables, resulting in the loss of some potentially important dependency information. Driven by these two issues, a robust knowledge metric approach is invented to perceive and excavate the latent information hidden in interaction. In this study, firstly, a robust fuzzy granularity space is constructed from different granular structures induced by different features, and the robust fuzzy uncertainty measures (RFUMs) are successively devised. Then, RFUMs are used to measure pair-wise, three-order, and even higher-order interaction dependencies among features. Further, a constrained high-order interaction evaluation function inspired by the N-gram language model is formulated, and a corresponding high-order interaction feature selection algorithm with RFUMs (HIFS-RFUMs) is designed. Next, comparative experiments with seven representative algorithms on twenty datasets illustrate its effectiveness. In addition, ablation experiments are conducted on the high-order interaction feature selection algorithm with fuzzy uncertainty measures (HIFS-FUMs) and the relative reduction algorithm with RFUMs (R2-RFUMs), which demonstrate the robustness of the metric and the effectiveness for mining high-order interactive features, respectively.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.