Abstract

Feature selection, which identifies representative features in observed data, can increase the utility of health data for predictive diagnosis. Unlike feature extraction, such as PCA and autoencoder based methods, feature selection preserves interpretability, meaning that it provides useful information about which feature subset is relevant to certain health conditions. Domain experts, such as clinicians, can learn from these relationships and use this knowledge to improve their diagnostic abilities. Mutual information (MI) based feature selection (MIBFS) is a classifier-independent approach that attempts to maximize the dependency (i.e., the MI) between the selected features and the target variable (label). However, implementing optimal MIBFS via exhaustive search with high-dimensional data can be prohibitively complex. As a result, many MIBFS approximation schemes have been developed in the literature. In this paper, we take another step forward by proposing a novel MIBFS method called Selection via Unique Relevant Information (SURI). We first quantify the unique relevant information (URI) present in each individual feature and use it to boost features with high URI. Via experiments on 6 healthcare data sets and 3 classifiers, we observe that SURI outperforms existing MIBFS methods, with respect to standard classification metrics. Furthermore, using a low-dimensional data set, we investigate optimal feature selection via exhaustive search and confirm the important role of URI, further verifying the principles behind SURI.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call