This research work demonstrates cipher-type identification methods using machine learning algorithms. Cipher-type identification is a recent research interest to do better cryptanalysis of an encryption algorithm in a minimal time. Along with the increased security issues, obfuscation is being used with encryption algorithms to keep them hidden. This is when the ciphertext identification challenge came into play. The ciphertext classification challenge was performed using both image processing and natural language processing methods. For image processing purposes, CNN was utilized; whereas text-CNN, transformers and BERT models were used as natural language processing tools. In order to train the proposed machine learning based classification models, two types of datasets were generated: image data and text data. This study compares the experimental outcomes derived from various architectural CNN, Transformer, and BERT models. We also present a comparative study of our research work with another research works which are done in the recent past. The proposed BERT model is found to be the most efficient model for the correct classification of ciphertext over other transformer and CNN-based classification models. This work will surely help the cryptanalyst to perform cryptanalysis of an encryption algorithm in a minimal time.
Read full abstract