Abstract

An essential step in data mining and machine learning is selecting a useful feature subset from the high-dimensional feature space. Many existing feature selection algorithms only consider precision, but do not consider error types and test cost. In this paper, we use the ℓ2,1-norm to propose a cost-sensitive embedded feature selection algorithm that minimizes the total cost rather than maximizing accuracy. The algorithm is a cost-sensitive feature selection algorithm with joint ℓ2,1-norm minimization of the loss function with misclassification costs. The ℓ2,1-norm based loss function with misclassification costs is robust to outliers. We also add an orthogonal constraint term to guarantee that each selected feature is independent. The proposed algorithm simultaneously takes into account both test costs and misclassification costs. Finally, an iterative updating algorithm is provided using the objective function that makes cost-sensitive feature selection more efficient. The cost-sensitive feature selection algorithm is more realistic than existing feature selection algorithms. Extensive experimental results on publicly available datasets demonstrate that the proposed algorithm is effective, can select a low-cost subset and achieve better performance than other feature selection algorithms in real-world applications.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call