Abstract

In data mining applications, one of the most fundamental problems is feature selection, which can improve the generalization capability of patterns. Recent developments in this field have led to a renewed interest in cost-sensitive feature selection. Minimal cost feature selection is devoted to obtain a trade-off between test costs and misclassification costs. However, this problem has been addressed on only nominal data. In this paper, we consider numerical data with measurement errors and propose a backtracking approach for this problem. First, we build a data model with normal distribution measurement errors. In order to deal with this model, the neighborhood of each data item is constructed through the confidence interval. Compared with discretized intervals, neighborhoods are more reasonable to maintain the information of data. And then, we redefine the minimal total cost feature selection problem on the neighborhood model. Finally, a backtracking algorithm with three effective pruning techniques is proposed to deal with this problem. Experimental results indicate that the pruning techniques are effective, and the algorithm is efficient for datasets with nearly one thousand objects.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call