Abstract
Feature selection is one of the key steps in building a predictive model in multi-label classification. However, most of the existing methods do not take into account information about the costs associated with considered features, such as the costs of performing diagnostic medical tests. We consider a problem of cost-constrained multilabel feature selection, which aims to select a feature subset relevant to multiple labels while satisfying a user-specific maximal admissible budget. This approach allows for building a model with high predictive power, for which the cost of making a prediction for a single instance does not exceed the user-specified budget. In this problem, the balance between the feature subset relevance and its cost should be considered concurrently, which is nontrivial in practice because their optimal balance is unknown. In this paper, we propose a novel criterion for cost-constrained multilabel feature selection that combines the relevance and cost of the candidate feature. The relevance measure is derived using the lower bound of the mutual information between the feature subset and label vector. Moreover, we propose an effective method for determining the cost-factor value that controls the trade-off between relevancy and cost. The experimental results on multilabel datasets with various characteristics demonstrate the superiority of the proposed method over conventional methods.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.