Feature selection is predominantly used in machine learning tasks, such as classification, regression, and clustering. It selects a subset of features (relevant attributes of data points) from a larger set that contributes as optimally as possible to the informativeness of the model. There are exponentially many subsets of a given set, and thus, the exhaustive search approach is only practical for problems with at most a few dozen features. In the past, there have been attempts to reduce the search space using dynamic programming. However, models that consider similarity in pairs of features alongside the quality of individual features do not provide the required optimal substructure. As a result, algorithms, which we will call suboptimal dynamic programming algorithms, find a solution that may deviate significantly from the optimal one. In this paper, we propose an iterative dynamic programming algorithm, which invertsthe order of feature processing in each iteration. Such an alternating approach allows for improving the optimization function by using the score from the previous iteration to estimate the contribution of unprocessed features. The iterative process is proven to converge and terminates when the solution does not change in three successive iterations or when the number of iterations reaches the threshold. Results in more than 95% of tests align with those of the exhaustive search approach, being competitive and often superior to the reference greedy approach. Validation was carried out by comparing the scores of output feature subsets and examining the accuracy of different classifiers learned on these features across nine real-world applications, considering different scenarios with various numbers of features and samples. In the context of feature selection, the proposed algorithm can be characterized as a robust filter method that can improve machine learning models regardless of dataset size. However, we expect that the idea of alternating suboptimal optimization will soon be generalized to tasks beyond feature selection.
Read full abstract