Abstract
In the field of data processing and analysis, the dataset may be a large set of features that restrict data usability and applicability, and thus the dimensions of data sets need to be reduced. Feature selection is the process of removing as much of the redundant and irrelevant features as possible from the original dataset to improve the mining process efficiency. This paper presented a study to evaluate and compare the effect of filter and wrapper methods as feature selection approaches in terms of classification accuracy and time complexity. The Naive Bayes Classifier and three classification datasets from the UCI repository are utilizing in the classification procedure. To investigate the effect of feature selection methods, they are applied to the different characteristics datasets to obtain the selected feature vectors which are then classified according to each dataset category. The datasets used in this paper are the Iris, Ionosphere, and Ovarian Cancer dataset. Experimental results indicate that the filter and wrapper methods provide approximately equal classification accuracy where the average accuracy value of the Ionosphere and Ovarian Cancer dataset is 0.78 and 0.91 for the same selected feature vectors respectively. For Iris dataset, the filter method outperforms the wrapper method by achieving the same accuracy value using only half number of selected features. The results also show that the filter method surpasses when considering the execution time.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.