Abstract

Irrelevant and redundant features increase the computation and storage requirements, and the extraction of required information becomes challenging. Feature selection enables us to extract the useful information from the given data. Streaming feature selection is an emerging field for the processing of high-dimensional data, where the total number of attributes may be infinite or unknown while the number of data instances is fixed. We propose a hybrid feature selection approach for streaming features using ant colony optimization with symmetric uncertainty (ACO-SU). The proposed approach tests the usefulness of the incoming features and removes the redundant features. The algorithm updates the obtained feature set when a new feature arrives. We evaluate our approach on fourteen datasets from the UCI repository. The results show that our approach achieves better accuracy with a minimal number of features compared with the existing methods.

Highlights

  • Irrelevant and redundant features increase the computation and storage requirements, and the extraction of required information becomes challenging

  • We propose a hybrid feature selection approach for streaming features using ant colony optimization with symmetric uncertainty (ACO-SU). e proposed approach tests the usefulness of the incoming features and removes the redundant features. e algorithm updates the obtained feature set when a new feature arrives

  • For the early termination of the selection method, we find the association among features by exploiting the filter method, symmetric uncertainty (SU), which is a modification of information gain [14]. e proposed approach is incremental in nature, where a complete retraining is not required if a new feature arrives. us, the computational time compared to a pure wrapper method can be reduced

Read more

Summary

Introduction

Irrelevant and redundant features increase the computation and storage requirements, and the extraction of required information becomes challenging. Learning algorithms tend to overfit because of the large set of features and a small number of datapoints Dimensionality reduction techniques such as feature selection and feature extraction need to be applied to deal with this problem [7]. Given a set of all features in advance, traditional feature selection methods tend to pick a subset of relevant features by eliminating redundant and irrelevant information [4, 8]. Select good features, may not be feasible in the case of streaming features because every time a new feature arrives, we need to retrain the classifier to measure the performance of the model. SFS is an emerging field, which provides benefits to traditional feature selection methods Since it works in an online manner, it performs feature selection without storing the huge data. Unlike the existing forward only search-based wrapper approaches that consider the incoming features only once, the proposed approach provides a forwardbackward search to select the most appropriate feature set

Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call