Feature–granularity selection with variable costs for hybrid data

Shujiao Liao,Qingxin Zhu,Yuhua Qian

doi:10.1007/s00500-019-03854-2

Abstract

In recent years, cost-sensitive feature selection has drawn much attention. However, some issues still remain to be investigated. Particularly, most existing work deals with single-typed data, while only a few studies deal with hybrid data; moreover, both the test cost of a feature and the misclassification cost of an object are often assumed to be fixed, but in fact they are usually variable with the error range of the data, or equivalently the data granularity. In view of these facts, a feature–granularity selection approach is proposed to select the optimal feature subset and the optimal data granularity simultaneously to minimize the total cost for processing hybrid data. In the approach, firstly an adaptive neighborhood model is constructed, in which the neighborhood granules are generated adaptively according to the types of features. Then, multiple kinds of variable cost setting are discussed according to reality, and finally, an optimal feature–granularity selection algorithm is designed. Experimental results on sixteen UCI datasets show that a good trade-off among feature dimension reduction, data granularity selection and total cost minimization could be achieved by the proposed algorithm. In particular, the influences of different cost settings to the feature–granularity selection are also discussed thoroughly in the paper, which would provide some feasible schemes for decision making.

Full Text