Abstract
In this paper, we propose a new online system that can quickly detect malicious spam emails and adapt to the changes in the email contents and the Uniform Resource Locator (URL) links leading to malicious websites by updating the system daily. We introduce an autonomous function for a server to generate training examples, in which double-bounce emails are automatically collected and their class labels are given by a crawler-type software to analyze the website maliciousness called SPIKE. In general, since spammers use botnets to spread numerous malicious emails within a short time, such distributed spam emails often have the same or similar contents. Therefore, it is not necessary for all spam emails to be learned. To adapt to new malicious campaigns quickly, only new types of spam emails should be selected for learning and this can be realized by introducing an active learning scheme into a classifier model. For this purpose, we adopt Resource Allocating Network with Locality Sensitive Hashing (RAN-LSH) as a classifier model with a data selection function. In RAN-LSH, the same or similar spam emails that have already been learned are quickly searched for a hash table in Locally Sensitive Hashing (LSH), in which the matched similar emails located in “well-learned” are discarded without being used as training data. To analyze email contents, we adopt the Bag of Words (BoW) approach and generate feature vectors whose attributes are transformed based on the normalized term frequency-inverse document frequency (TF-IDF). We use a data set of double-bounce spam emails collected at National Institute of Information and Communications Technology (NICT) in Japan from March 1st, 2013 until May 10th, 2013 to evaluate the performance of the proposed system. The results confirm that the proposed spam email detection system has capability of detecting with high detection rate.
Highlights
Emails have become one of the most frequently used methods for cyber attacks
We introduce the outlier detection mechanism into Resource Allocating Network with Locality Sensitive Hashing (RAN-Locally Sensitive Hashing (LSH)) in order to reduce the number of spam emails to be checked by SPIKE
We propose a spam email detection system by combining Resource Allocating Network (RAN)-LSH classifier [15] and SPIKE, so that the learning time is accelerated compared to when using SPIKE alone
Summary
Emails have become one of the most frequently used methods for cyber attacks. The most worrying email-based attack is Targeted Malicious Email (TME) [1] [2]. The victims’ computer will become the back door for the attackers who in turn have the authority to enter the network of the targeted persons and steal confidential information Another typical email-based cyber attack is the malicious spam email attack, which aims to spread numerous emails with Uniform Resource Locator (URL) links leading to malicious websites. A fake email notification regarding a conference or journal targeted towards a recipient with academic status, notifications regarding false documents such as telecommunication service bills, fax and voicemail in which the victims are given a link to get more information [4] This technique is called Social Engineering [5], which Hadnagy [6] defines as “The Art of Human Hacking”. It becomes difficult for normal users to distinguish between non-malicious and malicious spam emails and spam email from normal emails
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: Journal of Intelligent Learning Systems and Applications
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.