With the rising of Internet in early ’90s, many fraudulent activities have migrated from physical to digital: one of them is phishing. Phishing is a deceptive practice focused on exploiting the human factor, which is the most vulnerable aspect of any security process. In this scam, social engineering techniques are extensively utilized, specifically focusing on the principles of persuasion, to deceive individuals into disclosing sensitive information or engaging in malicious actions. This research explores the use of message subjectivity for detecting phishing attacks. It does so by assessing the impact of various data representations and classifiers on automatically identifying principles of persuasion. Furthermore, it investigates how these detected principles of persuasion can be leveraged for identifying phishing attacks. The experiments conducted revealed that there is no universal solution for data representation and classifier selection to effectively detect all principles of persuasion. Instead, a tailored combination of data representation and classifiers is required for detecting each principle. The Machine Learning models created automatically detect principles of persuasion with confidence levels ranging from 0.7306 to 0.8191 for AUC-ROC. Next, principles of persuasion detected are used for phishing detection. This study also emphasizes the need for user-friendly and comprehensible models. To validate the proposal presented, several families of classifiers were tested, but among all of them, tree-based models (and Random Forest in particular) stand out as preferred option. These models achieve similar level of effectiveness as alternative methods while offering improved clarity and user-friendliness, with an AUC-ROC of 0.859842.
Read full abstract