Abstract

The data-driven society of today generates very large volumes of high-dimensional data. Its efficient processing by established methods represents an increasing challenge and novel advanced approaches are needed. Feature selection is a traditional data pre-processing strategy that can be used to reduce the volume and complexity of data. It selects a subset of data features so that data volume is reduced but its information content maintained. Evolutionary feature selection methods have already shown good ability to identify in very-high-dimensional data sets feature subsets according to selected criteria. Their efficiency depends, among others, on feature subset representation and objective function definition. This work employs a recent genetic algorithm for fixed-length subset selection to find feature subsets on the basis of their entropy, estimated by a fast data compression method. The reasonability of this new fitness criterion and the usefulness of selected feature subsets for practical data mining is evaluated using well-known data sets and several widely-used classification algorithms.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.