Abstract

This research proposed an improved filtering spam technique for suspected emails, messages based on feature weight and the combination of two-step clustering and logistic regression algorithm. Unique, important features are used as the optimum input for a hybrid proposed approach. This study adopted a spam detector model based on distance measure and threshold value. The aim of this model was to study and select distinct features for email filtering using feature weight method as dimension reduction. Two-step clustering algorithm was used to generate a new feature called “Label” to cluster and differentiate the diversity emails and group them based on the inter samples similarity. Thereby the spam filtering process was simplified using the Logistic regression classifier in order to distinguish the hidden patterns of spam and non-spam emails. Experimental design was conducted based on the UCI spam dataset. The outcome of the findings shows that the results of the email filtering are promising compared to other modern spam filtering methods.

Highlights

  • Nowadays, email messages are considered as economic and most essential communicative way in the world

  • The original dataset is the common spam data that was normally used in spam filtering research, while the weighted dataset is generated from the original dataset (Spambase) by calculating the average of each feature inside the original data

  • By selecting the important features, the spam filtering performance will increase due to the features reduction that occurred by weighting process

Read more

Summary

Introduction

Email messages are considered as economic and most essential communicative way in the world. It is efficient, simple and accessible for all due to the internet availability. The term spam was used to define the undesirable message, junk-mails sent to web users’ inbox. It is most opportune for email spammers to send lots of messages to millions of users and without cost [2]. This makes it a public situation for all web users to receive unsolicited email regularly

Objectives
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.