Abstract
In data mining applications, one of the most fundamental problems is feature selection, which can improve the generalization capability of patterns. Recent developments in this field have led to a renewed interest in cost-sensitive feature selection. Minimal cost feature selection is devoted to obtain a trade-off between test costs and misclassification costs. However, this problem has been addressed on only nominal data. In this paper, we consider numerical data with measurement errors and propose a backtracking approach for this problem. First, we build a data model with normal distribution measurement errors. In order to deal with this model, the neighborhood of each data item is constructed through the confidence interval. Compared with discretized intervals, neighborhoods are more reasonable to maintain the information of data. And then, we redefine the minimal total cost feature selection problem on the neighborhood model. Finally, a backtracking algorithm with three effective pruning techniques is proposed to deal with this problem. Experimental results indicate that the pruning techniques are effective, and the algorithm is efficient for datasets with nearly one thousand objects.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.