PurposeDistinguishing phishing emails from legitimate emails continues to be a difficult task for most individuals. This study aims to investigate the psycholinguistic factors associated with deception in phishing email text and their effect on end-user ability to discriminate phishing emails from legitimate emails.Design/methodology/approachEmail messages and end-user decisions collected from a laboratory phishing study were validated and analyzed using natural language processing methods (Linguistic Inquiry Word Count) and penalized regression models (LASSO and Elastic Net) to determine the linguistic dimensions that attackers may use in phishing emails to deceive end-users and measure the impact of such choices on end-user susceptibility to phishing.FindingsWe found that most participants, who played the role of a phisher in the study, chose to deceive their end-user targets by pretending to be a familiar individual and presenting time pressure or deadlines. Results show that use of words conveying certainty (e.g. always, never) and work-related features in the phishing messages predicted higher end-user vulnerability. On the contrary, use of words that convey achievement (e.g. earn, win) or reward (cash, money) in the phishing messages predicted lower end-user vulnerability because such features are usually observed in scam-like messages.Practical implicationsInsights from this research show that analyzing emails for psycholinguistic features associated with computer-mediated deception could be used to fine-tune and improve spam and phishing detection technologies. This research also informs the kinds of phishing attacks that must be prioritized in antiphishing training programs.Originality/valueApplying natural language processing and statistical modeling methods to analyze results from a laboratory phishing experiment to understand deception from both attacker and end-user is novel. Furthermore, results from this work advance our understanding of the linguistic factors associated with deception in phishing email text and its impact on end-user susceptibility.
Read full abstract