Abstract

Feature selection is one of the key steps in building a predictive model in multi-label classification. However, most of the existing methods do not take into account information about the costs associated with considered features, such as the costs of performing diagnostic medical tests. We consider a problem of cost-constrained multilabel feature selection, which aims to select a feature subset relevant to multiple labels while satisfying a user-specific maximal admissible budget. This approach allows for building a model with high predictive power, for which the cost of making a prediction for a single instance does not exceed the user-specified budget. In this problem, the balance between the feature subset relevance and its cost should be considered concurrently, which is nontrivial in practice because their optimal balance is unknown. In this paper, we propose a novel criterion for cost-constrained multilabel feature selection that combines the relevance and cost of the candidate feature. The relevance measure is derived using the lower bound of the mutual information between the feature subset and label vector. Moreover, we propose an effective method for determining the cost-factor value that controls the trade-off between relevancy and cost. The experimental results on multilabel datasets with various characteristics demonstrate the superiority of the proposed method over conventional methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call