Study on the effectiveness of anomaly detection for spam filtering

Carlos Laorden,Xabier Ugarte-Pedrero,Igor Santos,Borja Sanz,Javier Nieves,Pablo G Bringas

doi:10.1016/j.ins.2014.02.114

Abstract

Spam has become an important problem for computer security because it is a channel for spreading threats, including computer viruses, worms and phishing. Currently, more than 85% of received emails are spam. Historical approaches to combating these messages, including simple techniques such as sender blacklisting or using email signatures, are no longer completely reliable on their own. Many solutions utilise machine-learning approaches trained with statistical representations of the terms that usually appear in the emails. Nevertheless, these methods require a time-consuming training step with labelled data. Dealing with the limited availability of labelled training instances slows down the progress of filtering systems and offers advantages to spammers. In this paper, we present a study of the effectiveness of anomaly detection applied to spam filtering, which reduces the necessity of labelling spam messages and only employs the representation of one class of emails (i.e., legitimate or spam). This study includes a presentation of the first anomaly based spam filtering system, an enhancement of this system that applies a data reduction algorithm to the labelled dataset to reduce processing time while maintaining detection rates and an analysis of the suitability of choosing legitimate emails or spam as a representation of normality.

Full Text