Abstract
Nowadays, the massive growth of data makes the data classification a challenging task. The feature selection is a demanding area to take this challenge and produce the higher accuracy in data classification by reducing the dimensionality of the data. Particle Swarm Optimization (PSO) is a computational technique which is applied in the feature selection process to get an optimal solution. This paper proposes a PSO and F-Score based feature selection algorithm for selecting the significant features that contribute to improve the classification accuracy. The performance of the proposed method is evaluated with various classifiers such as support vector machine (SVM), Naive Bayes, KNN and Decision Tree. The experimental results show that the proposed method outperforms the other methods compared. Feature selection is an important task in data classification. It is very significant for the reason that it improves the accuracy of the classification process. Hence the contribution of feature selection to the field of pattern classification is at a greater scale. It helps in reducing the inputs for processing and analysis to a manageable size. Also, it facilitates efficient analysis of the given dataset. The feature selection algorithm is classified into three types namely wrapper, filter, and embedded approach. In the wrapper approach, the unsupervised learning algorithm is used as performance criteria with a searching algorithm. The searching algorithm generates the feature subsets based on any one of the searching strategies and these feature subsets are selected based on the unsupervised learning algorithm with a criterion. The filter approach first selects significant feature subset before application of any classification algorithm and removes least significant features from the given dataset. The embedded approach uses a part of the learning algorithms for selecting the features. It is less computationally expensive than wrapper algorithm. The embedded approach is the combination of filter and wrapper approaches. Basically there are two types of optimization techniques that can be employed in the feature selection. They are deterministic algorithms and stochastic algorithms. The deterministic algorithms include approaches such as breadth first search (BFS), depth first search (DFS), gradient method, and etc. The stochastic algorithm generates random variables. For stochastic optimization, the random variables appear in the formulation of optimization problem itself, which involves random objective functions or random constraints. Hence, the outputs of these kinds of algorithms are not always constant. These algorithms include particle swarm optimization (PSO), genetic algorithm (GA), and ant colony optimization (ACO).
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have