Abstract

Using passwords to verify a user's identity is the most widely deployed method for electronic authentication. When system administrators need to recover lost passwords or test accounts for easily guessable passwords, it can require millions of hash function and string comparison operations. These operations can be computationally expensive but are easily parallelizable because each password can be tested independently. Therefore, using high performance computing (HPC) can greatly reduce the time required to perform password recovery. Due to the high level of fine-grained parallelism of this type of problem, GPU computing using Compute Unified Device Architecture (CUDA) can be used to further improve performance. The scale of HPC can be further increased through the use of multiple GPUs, but this requires communication between the GPU devices and can reduce the overall performance due to increased communications latency. In this work a well established HPC framework, Message Passing Interface (MPI), was used to minimize the amount of latency and handle the communication between the devices. This allowed for a course-grained division of the problem using MPI where each device applies a fine-grained division of the problem using CUDA to perform the actual calculations. This paper describes three dictionary-based password recovery algorithms that use both MPI and CUDA. In this approach the hashed values of known words are computed and compared with hash values of unknown user passwords. The algorithms differed in GPU memory utilization and how the data was divided and distributed among the MPI nodes and GPU devices. A divided dictionary algorithm split the dictionary of potential passwords over the G PUs and copied the password database to each GPU. A divided password database algorithm split the password database and copied the potential passwords. A minimal memory algorithm split the password database and sequentially processed individual passwords on the GPUs. The divided dictionary and the divided password database algorithms performed well, resulting in a speedup of 57x and 40x over a single processor using 8 GPUs across 4 compute nodes, respectively. Illustrating the cost of communication latency between MPI nodes and GPUs, the minimal memory algorithm performed significantly slower than a single CPU. The algorithms are shown to scale well to multiple GPUs, so this password recovery system could be used for much larger systems for larger databases. In addition to recovering lost passwords, this work could be used to help improve the security of computer systems by identifying accounts with weak or common passwords. The framework described may also be useful for other research that needs to process large amounts of data with similar characteristics using MPI and CUDA.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call