Abstract

Algorithmic decipherment is a prime example of a truly unsupervised problem. The first step in the decipherment process is the identification of the encrypted language. We propose three methods for determining the source language of a document enciphered with a monoalphabetic substitution cipher. The best method achieves 97% accuracy on 380 languages. We then present an approach to decoding anagrammed substitution ciphers, in which the letters within words have been arbitrarily transposed. It obtains the average decryption word accuracy of 93% on a set of 50 ciphertexts in 5 languages. Finally, we report the results on the Voynich manuscript, an unsolved fifteenth century cipher, which suggest Hebrew as the language of the document.

Highlights

  • The Voynich manuscript is a medieval codex1 consisting of 240 pages written in a unique script, which has been referred to as the world’s most important unsolved cipher (Schmeh, 2013)

  • Inspired by the mystery of both the Voynich manuscript and the undeciphered ancient scripts, we develop a series of Transactions of the Association for Computational Linguistics, vol 4, pp. 75–86, 2016

  • The development and test parts were separately enciphered with a random substitution cipher

Read more

Summary

Introduction

The Voynich manuscript is a medieval codex consisting of 240 pages written in a unique script, which has been referred to as the world’s most important unsolved cipher (Schmeh, 2013). The type of cipher that was used to generate the text is unknown; a number of theories have been proposed, including substitution and transposition ciphers, an abjad (a writing system in which vowels are not written), steganography, semi-random schemes, and an elaborate hoax. Identification of the underlying language has been crucial for the decipherment of ancient scripts, including Egyptian hieroglyphics (Coptic), Linear B (Greek), and Mayan glyphs (Ch’olti’). Even the order of characters within text may be in doubt; in Egyptian hieroglyphic inscriptions, for instance, the symbols were sometimes rearranged within a word in order to create a more elegant inscription (Singh, 2011). Another complicating factor is the omission of vowels in some writing systems

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.