All-relevant feature selection using multidimensional filters with exhaustive search

Krzysztof Mnich,Witold R Rudnicki

doi:10.1016/j.ins.2020.03.024

Krzysztof Mnich, Witold R Rudnicki

Open Access

https://doi.org/10.1016/j.ins.2020.03.024

Copy DOI

Abstract

This paper describes a method for the identification of informative variables in an information system with discrete decision variables. It is targeted specifically towards the discovery of variables that are non-informative when considered alone, but informative when the synergistic interactions between multiple variables are considered. To this end, mutual entropy of all possible k-tuples of variables with a decision variable is computed. Then, for each variable, the maximum information gain due to interactions with other variables is obtained. For non-informative variables, this quantity conforms to well-known statistical distributions. This allows for discerning the truly informative variables from the non-informative ones. To demonstrate this approach, the method is applied to several synthetic datasets that involve complex multidimensional interactions between variables. The performance of the method is also validated on a real-world dataset. It is shown that the method is capable of identifying the most important informative variables, even when the dimensionality of the analysis is smaller than the true dimensionality of the problem. What is more, the high sensitivity of the algorithm allows for the detection of the influence of nuisance variables on the response variable.

Full Text