An Integrated Information Gain with A Black Hole Algorithm for Feature Selection: A Case Study of E-mail Spam Filtering

Amaal Mahmood,Adnan Hadi Mahdi Al-Helali

doi:10.24996/ijs.2023.64.9.38

Abstract

The current issues in spam email detection systems are directly related to spam email classification's low accuracy and feature selection's high dimensionality. However, in machine learning (ML), feature selection (FS) as a global optimization strategy reduces data redundancy and produces a collection of precise and acceptable outcomes. A black hole algorithm-based FS algorithm is suggested in this paper for reducing the dimensionality of features and improving the accuracy of spam email classification. Each star's features are represented in binary form, with the features being transformed to binary using a sigmoid function. The proposed Binary Black Hole Algorithm (BBH) searches the feature space for the best feature subsets, and feature selection is based on a fitness function that is proportional to the accuracy achieved using a Naive Bayesian Classifier (NBC). When measuring the performance of the BBH with the SpamBase dataset, the performance of the classifier and the dimension of the selected feature vector used as a classifier input are considered. The experiments revealed that the BBH can produce good FS results even with a small set of selected features. This shows that when utilizing the NBC-based BBH, good spam email categorization accuracy is possible.

Full Text