Abstract

Feature selection has been shown to be beneficial for many data mining and machine learning tasks, especially for big data analytics. Mutual Information (MI) is a well-known information-theoretic approach used to evaluate the relevance of feature subsets and class labels. However, estimating high-dimensional MI poses significant challenges. Consequently, a great deal of research has focused on using low-order MI approximations or computing a lower bound on MI called Variational Information (VI). These methods often require certain assumptions made on the probability distributions of features such that these distributions are realistic yet tractable to compute. In this paper, we reveal two sets of distribution assumptions underlying many MI and VI based methods: Feature Independence Distribution and Geometric Mean Distribution. We systematically analyze their strengths and weaknesses and propose a logical extension called Arithmetic Mean Distribution, which leads to an unbiased and normalised estimation of probability densities. We conduct detailed empirical studies across a suite of 29 real-world classification problems and illustrate improved prediction accuracy of our methods based on the identification of more informative features, thus providing support for our theoretical findings.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.