Feature selection is a crucial step when building supervised predictive models. In many medical applications, features are associated with costs. For example, the diagnostic value extracted by a clinical test is associated with its own cost. Costs can also refer to a non-financial aspects, such as a decision between an invasive exploratory surgery and a simple blood test. Traditional feature selection methods, which ignore costs, aim to choose a subset of features that maximize the accuracy of the corresponding model. However, such a model can be impractical as the total cost of making a prediction may exceed the assumed user-specified budget. In cost-constrained methods, it is necessary to take into account both the relevance of the feature and its cost. We focus on embedded feature selection methods based on a very general penalized empirical risk minimization framework that includes various loss functions. The most natural ℓ0\\documentclass[12pt]{minimal} \\usepackage{amsmath} \\usepackage{wasysym} \\usepackage{amsfonts} \\usepackage{amssymb} \\usepackage{amsbsy} \\usepackage{mathrsfs} \\usepackage{upgreek} \\setlength{\\oddsidemargin}{-69pt} \\begin{document}$$\\ell _0$$\\end{document}-type penalty is computationally intractable and therefore we analyze other penalties such as: the cost-sensitive lasso and adaptive lasso, non-convex penalties and the method based on knockoffs. The experiments performed on real medical datasets, including large database MIMIC, indicate that non-convex penalties give promising results, in particular they allow to achieve high accuracy, especially when the assumed budget is low. Our model achieved AUC 0.88 for the MIMIC-II dataset in which we predict the occurrence of liver diseases based on clinical features with a budget equal to (5%) of the cost of all available features, which is significantly better than the AUC for the traditional method. Moreover, the fraction of feature cost wasted for noisy features in our method is usually lower than for cost-sensitive lasso.
Read full abstract