Abstract

A fundamental challenge in learning is the presence of nonlinear redundancies and dependencies in the data. To address this, we propose a Fourier-based approach to characterize feature redundancies, in unsupervised learning, and feature-label dependencies, in the supervised variant of the problem. We first develop a novel Fourier expansion for functions (more generally stochastic mappings) of correlated binary random variables. This is a generalization of the standard Fourier expansion on the Boolean cube beyond product probability spaces. As an important application of this analysis, we investigate learning with feature subset selection. In the unsupervised variant of this problem, we characterize feature redundancies via the Shannon entropy and group the features into sufficiently informative and redundant. Then, we make a connection to the proposed Fourier expansion and derive an upper bound on the joint entropy. Based on that, we propose a measure to quantify feature redundancies and present an unsupervised learning algorithm. We test our method on various real-world and synthetic datasets and demonstrate improvements on conventional unsupervised feature selection techniques. Then, we investigate the supervised feature subset selection and reformulate it in the Fourier domain. Bridging the Bayesian error rate with the Fourier coefficients, we demonstrate that the Fourier expansion provides a powerful tool to characterize nonlinear feature-label dependencies. Further, we introduce a computationally efficient measure for selecting relevant features. Via a theoretical analysis, we show that our proposed measure finds provably <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">asymptotically optimal</i> feature subsets. Lastly, we present an algorithm based on this measure and via numerical experiments demonstrate its improvements on various supervised feature selection algorithms.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call