Honey, I Chunked the Passwords: Generating Semantic Honeywords Resistant to Targeted Attacks Using Pre-trained Language Models

Fangyi Yu,Miguel Vargas Martin

doi:10.1007/978-3-031-35504-2_5

Abstract

Honeywords are fictitious passwords inserted into databases in order to identify password breaches. The major challenge is producing honeywords that are difficult to distinguish from real passwords. Although the generation of honeywords has been widely investigated in the past, the majority of existing research assumes attackers have no knowledge of the users. These honeyword generating techniques (HGTs) may utterly fail if attackers exploit users’ personal identifiable information (PII) and the real passwords include users’ PII. The literature has demonstrated that password guessing is more effective when focusing on each of the chunks that compose a password (e.g., “P@ssword123” contains two chunks: “P@ssword” and “123”) and it has been suggested that, when available, PII should be used to generate honeywords. We thus leverage these findings to base our HGT method on any possible PII contained within passwords, and introduce a new, and more robust than its literature counterparts, method to generate honeywords, which consists of generating honeywords with GPT-3 using the semantic chunks of their corresponding real passwords. Furthermore, we propose a new metric, HWSimilarity, to evaluate the capability of HGTs. HWSimilarity is a pre-trained language model-based similarity metric that considers the semantic meaning of passwords when measuring the indistinguishability of honeywords and their counterparts. Comparing our chunk-level GPT-3 HGT to two state-of-the-art HGTs and using GPT-3 alone, we show that our HGT can generate honeywords that are more indistinguishable than its counterparts.

Full Text