Fast randomized algorithm with restart strategy for minimal test cost feature selection

Jingkuan Li,Hong Zhao,William Zhu

doi:10.1007/s13042-014-0262-0

Abstract

Feature selection is an important problem in data mining and machine learning. Cost-sensitive feature selection over the past 10 years has seen increasingly rapid advances. Recently, a fast randomized algorithm was designed for this problem. Unfortunately, the performance of algorithms is heavily dependent on the size of datasets. In this paper, we propose an optimal randomized algorithm which adopts the restart strategy in the fast randomized algorithm for minimal test cost feature selection. The restart strategy can effectively address this limitation which depends on the size of datasets. There are two major stages in our algorithm: the addition stage and the deletion stage. In the addition stage, we obtain a subset to represent the original attributes. In this stage, the randomized mechanism helps us to select a subset rapidly. In the deletion stage, we remove redundant attributes by using the restart strategy to further improve the efficiency. Compared with the fast randomized algorithm, the effective of our algorithm increases sharply in the same runtime. Meanwhile, our algorithm reduces about a quarter of the runtime under the same number of experiments.

Full Text