We consider a multilabel all-relevant feature selection task which is more general than the classical minimal-optimal subset task. Whereas the goal of the minimal-optimal methods is to find the smallest subset of features allowing accurate prediction of labels, the objective of the all-relevant methods is to identify all the features that are related to the target labels, including strongly and all weakly relevant features. The all-relevant task has received much interest in the fields where discovering the dependency structure between features and target variables is more important than the prediction itself, e.g., in medical and bioinformatics applications. In this paper, we formally describe the all-relevant problem for multi-label classification using an information-theoretic approach. We propose a relevancy score and an efficient method of its calculation based on the lower bounds of conditional mutual information. Another practical issue is how to separate the relevant features from irrelevant ones. To find a threshold, we propose a testing procedure based on a permutation scheme. Finally, empirical evaluation of all-relevant methods requires a specific approach. We consider a large variety of simulated datasets representing different dependency structures and containing various types of interactions. Empirical results on simulated datasets and a large clinical database demonstrate that the proposed method can successfully identify relevant features.
Read full abstract