Abstract
Identity authentication is a main line of defense for network security, and passwords have long been the mainstream of identity authentication. In the field of password security research, large-scale password datasets have played an important role in the efficiency evaluation of password attack algorithms, the feasibility detection of password strength meters, and the correction of password probability models. However, due to user privacy, timeliness, effectiveness and other factors, it is still very difficult for researchers to obtain real large-scale user plaintext passwords. Based on this, this paper proposes a fast simulative password set generation algorithm based on structure partitioning and string recombination, denoted as SPSR-FSPG. The algorithm uses the probability context-free grammar to model the structure of the password, and constructs a string generation model based on the recurrent neural network to generate different types of strings, so as to learn the character composition of the password in the original dataset. In addition, the model fully considers the user's password reuse and modification behavior. Finally, the method is verified by experiment on six real Chinese and English password sets. The results show that the generation rate of SPSR-FSPG is faster than other algorithms. In terms of true password coverage, the SPSR-FPSG simulative password set is increased by 11.36% and 17.5, respectively, relative to SPPG and PCFG, and is increased by about 122.73% and 130.3%, respectively, compared to OMEN and 4-Markov. And the fit of the Zipf distribution is maintained at a level above 0.95, it is better than 0.9 of SPPG. At the same time, the SPPR-FPSG simulative password set is closer to the real password set in terms of length and character composition.
Highlights
Password has become one of the most popular user authentication methods [1]–[3]
Large-scale real password datasets are frequently used to mine the habit of constructing passwords [5], [6], The associate editor coordinating the review of this manuscript and approving it for publication was Sohail Jabbar
We use the hybrid model of Probabilistic ContextFree Grammar (PCFG) and BiLSTM Recurrent Neural Network (RNN) to mine the features of structure and character in the original sample, use the perturbationbased sample generation idea in machine learning to simulate the user’s password habits, proposing a fast password set generation model based on structure partition and recurrent neural network (RNN), recorded as SPSR-FSPG
Summary
Password has become one of the most popular user authentication methods [1]–[3]. It is easy to deploy, and accompanied by serious security threats [4]. Users don’t pay enough attention to these websites, so there are many vulnerable behaviors (password reuse, the length is too short, composed of only a single character type, etc.), which makes the research results not directly affect the security protection of highly sensitive systems such as online banking and enterprise information systems. It is still very difficult for researchers to acquire large-scale password sets, but both real password sets and high-quality simulative password sets are important for generating password dictionary or evaluating the efficiency of guessing algorithms and the validity of the password strength metres.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.