Classifying World War II era ciphers with machine learning

Brooke Dalton,Mark Stamp

doi:10.1080/01611194.2024.2304888

Abstract

We determine the accuracy with which machine learning and deep learning techniques can classify selected World War II era ciphers when only ciphertext is available. The specific ciphers considered are Enigma, M-209, Sigaba, Purple, and Typex. We experiment with three classic machine learning models, namely, Support Vector Machines (SVM), k-Nearest Neighbors (k-NN), and Random Forest (RF). We also experiment with four deep learning models: Multi-Layer Perceptrons (MLP), Long Short-Term Memory (LSTM), Extreme Learning Machines (ELM), and Convolutional Neural Networks (CNN). Each model is trained on features consisting of histograms, digrams, and raw ciphertext letter sequences. Furthermore, the classification problem is considered under four distinct scenarios: Fixed plaintext with fixed keys, random plaintext with fixed keys, fixed plaintext with random keys, and random plaintext with random keys. Under the most realistic scenario, given 1,000 characters per ciphertext, we are able to distinguish the ciphers with more than 97% accuracy. In addition, we consider the accuracy of a subset of the learning techniques as a function of the ciphertext length. We find that classic learning models outperform the deep learning models that we tested, and ciphers that are more similar in design are somewhat more challenging to distinguish.

Full Text