Abstract

Online Social Networks (OSNs) allow easy membership leading to registration of a huge population and generation of voluminous information. These characteristics attract spammers to spread spam which may cause annoyance, financial loss, or personal information loss to the user and also weaken the reputation of social network sites. Most of the spam detection methods are based on user and content-based features using machine learning techniques. But, these annotated features are difficult to extract in real-time due to the privacy policy of most social network sites. Even for the features that can be extracted, because of their large size, the manual extraction process is complex and time-consuming. So there is a need for text level spam detection that does not require extraction of hard-core features. Existing deep learning based or existing single attention mechanism based text classification methods could not perform well as social network data are sparse with short texts and noises. Moreover, Spammers avoid direct spam words and use indirect words to evade spam filtering techniques and thus resulting in the dynamic and non-stationary nature of the social network spam texts. These indirect words contain hidden context that creates attention drift problem. So conjoint attention mechanism along with two attention mechanisms namely normal attention and context preserving attention are proposed to avoid attention drift problem in this deep learning-based text level spam detection technique (TextSpamDetector). Attention drift problem is solved by one attention mechanism which helps to find the important words while another attention mechanism allows focusing on attention in target context by referring to higher level abstraction of context vector. These attention mechanisms are referring to different context representations of the input text for finding informative words from the structural context representation. This structural context representation containing both local semantic features as well as global semantic dependency features is generated by CNN and BiLSTM. The proposed model is evaluated with the existing spam detection techniques using three datasets and the experimental results have proved that the proposed model performs well in terms of accuracy, F measure, and false-positive rate.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call