Abstract

The unsolicited bulk messages are widespread in the applications of short messages. Although the existing spam filters have satisfying performance, they are facing the challenge of an adversary who misleads the spam filters by manipulating samples. Until now, the vulnerability of spam filtering technique for short messages has not been investigated. Different from the other spam applications, a short message only has a few words and its length usually has an upper limit. The current adversarial learning algorithms may not work efficiently in short message spam filtering. In this paper, we investigate the existing good word attack and its counterattack method, i.e. the feature reweighting, in short message spam filtering in an effort to understand whether, and to what extent, they can work efficiently when the length of a message is limited. This paper proposes a good word attack strategy which maximizes the influence to a classifier with the least number of inserted characters based on the weight values and also the length of words. On the other hand, we also proposes the feature reweighting method with a new rescaling function which minimizes the importance of the feature representing a short word in order to require more inserted characters for a successful evasion. The methods are evaluated experimentally by using the SMS and the comment spam dataset. The results confirm that the length of words is a critical factor of the robustness of short message spam filtering to good word attack.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call