Application of Feature Selection Methods for Improving Classifcation Accuracy and Run-Time: A Comparison of Performance on Real-World Datasets

Yaseen Hamza Pullissery,Andrew Starkey

doi:10.1109/icaaic56838.2023.10140952

Abstract

Big data are produced in high volume, velocity, veracity, and variety. They present unprecedented opportunities to improve our life that is deeply rooted in the use of information. Machine Learning and Data mining techniques are employed to find useful patterns and insights from the data. Feature selection is among the most important steps of Machine learning and data mining processes and, if carefully configured, will have a direct impact on the performance and reduction of the computational time of the learning algorithms. However, there are challenges in discovering useful patterns from the data due to high dimensionality and data quality issues. As a result, experienced professionals are required to provide guidance or input in almost all the processes, mainly the feature selection process and the selection and tuning of optimal parameters for the classification phase, thus making data mining a manual, expensive task and unsuitable for data that are generated at high velocity. This research study provides an overview of feature selection methods and perform feature selection on real-world datasets. Further, the impact of feature selection on classification accuracy and runtime is evaluated by using Support Vector Machine and Deep Learning. Finally, this research study highlights the importance and need for improved feature selection methods for sustainable, efficient, and green algorithms.

Full Text