Two-phase fuzzy feature-filter based hybrid model for spam classification

Gazal Gazal,Kapil Juneja

doi:10.1016/j.jksuci.2022.10.025

Gazal Gazal, Kapil Juneja

Open Access

https://doi.org/10.1016/j.jksuci.2022.10.025

Copy DOI

Abstract

A spam message is forced data in a public, private or commercial network that occupies resources and affects the reliability of the network. This paper investigates a two-level filter-based hybrid model to identify spam content or messages accurately. At level-1 of this model, a high-level filter is incorporated for removing the non-relevant and non-significant features and contents. At level-2, a fuzzy-based composite evaluator is integrated for low-level filtration and to identify the most contributing and effective features. In this composite filter, ChiSquare and ReliefF rankers are computed on each significant feature. Two-phase fuzzy is applied to these ranking methods for generating a reduced and relevant featureset. In the final stage, the Naive Bayes and random forest classifiers are combined using the majority voting method to generate a probabilistic score and detect the spam messages. The proposed model is implemented on CSDMC2010 SPAM, spambase, and SMS Spam Collection datasets. The analytical evaluation is conducted on the error and accuracy-based performance measures. The comparative analysis is done against various conventional filters, classifiers, and stage-of-art methods. The comparative results identified that the proposed model achieved an average accuracy of 98.80% on CSDMC2010, 97.79% on spambase, and 98.84% on SMS Spam collection datasets and outperformed the existing conventional and recent algorithms and models.

Full Text