Abstract

A spam message is forced data in a public, private or commercial network that occupies resources and affects the reliability of the network. This paper investigates a two-level filter-based hybrid model to identify spam content or messages accurately. At level-1 of this model, a high-level filter is incorporated for removing the non-relevant and non-significant features and contents. At level-2, a fuzzy-based composite evaluator is integrated for low-level filtration and to identify the most contributing and effective features. In this composite filter, ChiSquare and ReliefF rankers are computed on each significant feature. Two-phase fuzzy is applied to these ranking methods for generating a reduced and relevant featureset. In the final stage, the Naive Bayes and random forest classifiers are combined using the majority voting method to generate a probabilistic score and detect the spam messages. The proposed model is implemented on CSDMC2010 SPAM, spambase, and SMS Spam Collection datasets. The analytical evaluation is conducted on the error and accuracy-based performance measures. The comparative analysis is done against various conventional filters, classifiers, and stage-of-art methods. The comparative results identified that the proposed model achieved an average accuracy of 98.80% on CSDMC2010, 97.79% on spambase, and 98.84% on SMS Spam collection datasets and outperformed the existing conventional and recent algorithms and models.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.