Abstract

This paper describes a method for the identification of informative variables in an information system with discrete decision variables. It is targeted specifically towards the discovery of variables that are non-informative when considered alone, but informative when the synergistic interactions between multiple variables are considered. To this end, mutual entropy of all possible k-tuples of variables with a decision variable is computed. Then, for each variable, the maximum information gain due to interactions with other variables is obtained. For non-informative variables, this quantity conforms to well-known statistical distributions. This allows for discerning the truly informative variables from the non-informative ones. To demonstrate this approach, the method is applied to several synthetic datasets that involve complex multidimensional interactions between variables. The performance of the method is also validated on a real-world dataset. It is shown that the method is capable of identifying the most important informative variables, even when the dimensionality of the analysis is smaller than the true dimensionality of the problem. What is more, the high sensitivity of the algorithm allows for the detection of the influence of nuisance variables on the response variable.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.