Abstract
In this paper, we propose a new online system to detect malicious spam emails and to adapt to the changes of malicious URLs in the body of spam emails by updating the system daily. For this purpose, we develop an autonomous system that learns from double-bounce emails collected at a mail server. To adapt to new malicious campaigns, only new types of spam emails are learned by introducing an active learning scheme into a classifier model. Here, we adopt Resource Allocating Network with Locality Sensitive Hashing (RAN-LSH) as a classifier model with data selection. In this data selection, the same or similar spam emails that have already been learned are quickly searched for a hash table using Locally Sensitive Hashing, and such spam emails are discarded without learning. On the other hand, malicious spam emails are sometimes drastically changed along with a new arrival of malicious campaign. In this case, it is not appropriate to classify such spam emails into malicious or benign by a classifier. It should be analyzed by using a more reliable method such as a malware analyzer. In order to find new types of spam emails, an outlier detection mechanism is implemented in RAN-LSH. To analyze email contents, we adopt the Bag-of-Words (BoW) approach and generate feature vectors whose attributes are transformed based on the normalized term frequency-inverse document frequency. To evaluate the developed system, we use a dataset of double-bounce spam emails which are collected from March 1st, 2013 to August 29th, 2013. In the experiment, we study the effect of introducing the outlier detection in RAN-LSH. As a result, by introducing the outlier detection, we confirm that the detection accuracy is enhanced on average over the testing period.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.