Hypergraph based information-theoretic feature selection

Zhihong Zhang,Edwin R. Hancock

doi:10.1016/j.patrec.2012.03.021

Abstract

In many data analysis tasks, one is often confronted with the problem of selecting features from very high dimensional data. The feature selection problem is essentially a combinatorial optimization problem which is computationally expensive. To overcome this problem it is frequently assumed that features either independently influence the class variable or do so only involving pairwise feature interaction. To overcome this problem, we draw on recent work on hyper-graph clustering to select the most informative feature subset (mIFS) from a set of objects using high-order (rather than pairwise) similarities. There are two novel ingredients. First, we use a new information theoretic criterion referred to as the multidimensional interaction information (MII) to measure the significance of different feature combinations with respect to the class labels. Secondly, we use hypergraph clustering to select the most informative feature subset (mIFS), which has both low redundancy and strong discriminating power. The advantage of MII is that it incorporates third or higher order feature interactions. Hypergraph clustering, which extracts the most informative features. The size of the most informative feature subset (mIFS) is determined automatically. Experimental results demonstrate the effectiveness of our feature selection method on a number of standard data-sets.

Full Text