Abstract

With the increase of data dimension in many application fields, feature selection, as an essential step to avoid the curse of dimensionality and enhanced the generalization of the model, is attracting more and more research attention. However, most existing feature selection methods always assume the features have the same cost. These research efforts mainly focus on features’ relevance to learning performance while neglecting the cost to obtain them. Feature cost is a crucial factor need to be considered in feature selection problem especially for the real world applications. For example, in the process of medical diagnosis, each feature may have a very different testing cost. To select low-cost subsets of informative features, in this paper, we propose a stratified random forest-based cost-sensitive feature selection method. Unlike commonly used two-step cost-sensitive feature selection approaches, in our model, the cost of features is incorporated into the construction process of the base decision tree, that is, the cost and the performance of each feature are optimized simultaneously. Moreover, we adopt a stratified sample method to enhance the performance of the feature subset for high-dimensional data. A series of experimental results show that compared with the state-of-the-art methods, the proposed approach can lower the cost of the selected feature subset while maintaining comparable learning performance.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call