Speeding up joint mutual information feature selection with an optimization heuristic

Heng Liu,Gregory Ditzler

doi:10.1109/ssci.2017.8280921

Abstract

Feature selection is an important pre-processing stage for nearly all data analysis pipelines that impact applications, such as genomics, life & biomedical sciences, and cyber-security. These application-driven fields rely on feature selection to not only build classifiers but also understand the data by trying to discover knowledge. Furthermore, these dimensionality reduction methods address the curse of dimensionality as well if classification is the end objective. Feature selection is formally defined as the process of optimizing the minimum subset of features that allow for the maximum predictive power. In this work, we address the problem of feature selection by using information theory. Many of the information-theoretic approaches perform a greedy forward optimization, however, the overall efficacy is a concern since most of the existing greedy algorithms only work in a centralized fashion (e.g. single sequential thread). Recent work has unified approximately 20 years of Information-theoretic feature selection works into a unified conditional likelihood maximization framework, which showed that the joint mutual information maximization (JMI) objective is generally the best option (i.e., empirically evaluated on many datasets). Unfortunately, JMI is infeasible on extremely high dimensional datasets. In this contribution, we propose a semi-parallel implementation of JMI that takes advantage of the fact that multiple features can be selected at the same time after observing a pattern in the greedy forward search. This work also bridges the gap between greedy feature selection and parallel feature selection, which can deliver significant speedup for large scale feature selection tasks. We evaluated the efficacy of the proposed approach with the traditional implementation of JMI using a greedy forward selection on several UCI datasets. The experimental results demonstrate that the proposed approach is capable of selecting the same features that traditional JMI returns; however, the features are selected in a fraction of the time.

Full Text