Published: 2019-06-12
ISBN: 978-91-7685-087-9
ISSN: 1650-3686 (print), 1650-3740 (online)
The present paper uses information theoretic entropy as a means to evaluate the authenticity of homophonic substitution ciphers. We motivate the use of entropy on n-grams and then validate its applicability, by using it on various true ciphers and pseudo-ciphers. Differences in entropy allow us to apply further formal analyses, e.g. support-vector machines, in order to make predictions about a potential cipher’s status. We train several support-vector machines and validate them. We then apply the models to two classic ciphers, the Zodiac Killer’s first major cipher, z408, which has been solved, and his second cipher, z340, which remains unsolved. The models correctly identify z408 as a substitution cipher. z340 is classified as an advanced cipher or pseudo-cipher.
cryptography
cryptology
substitution ciphers
entropy
n-gram-based analysis
support-vector machines
z340
Asa Ben-Hur, David Hom, Hava T. Siegelmann, and Vladimir Vapnik. 2002. Support vector clustering. Journal of Machine Learning Research, 2: 125-137.
Eric Corlett and Gerald Penn. 2010. An exact a* method for deciphering letter-substitution ciphers. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, ACL ’10, pages 1040---1047, Stroudsburg, PA, USA. Association for Computational Linguistics.
Edward W. Forgy. 1965. Cluster analysis of multivariate data: Efficiency versus interpretability of classification. Biometrics, 21(3):768-769.
Robert Graysmith. 1987. Zodiac. Berkley, New York City, NY.
Jeffrey Hoffstein, Jill Pipher, and Joseph H. Silverman. 2008. An Introduction to Mathematical Cryptography. Springer Publishing Company, Incorporated, 1 edition.
Dan Jurafsky, 2019. Lecture CS 124: From Languages to Information. Introduction to N-grams – lecture notes. http://web.stanford.edu/class/cs124/.
Kevin Knight, 2013. Decipherment Tutorial. Workshop at the 20!3 Annual Meeting of the Association for Computational Linguistics - lecture notes. https://kevincrawfordknight.github.io/extra/acl-tutorial-13-decipher-final.pdf.
Stuart P. Lloyd. 1982. Least squares quantization in pcm. IEEE Transactions on Information Theory, 28:129-137.
Project Gutenberg, 2018. Project Gutenberg. http://www.gutenberg.org.
Python Software Foundation, 2018. Python: A dynamic, open source programming language. https://www.python.org/.
Sujith Ravi and Kevin Knight. 2008. Attacking decipherment problems optimally with low-order ngram models. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP ’08, pages 812-819, Stroudsburg, PA, USA. Association for Computational Linguistics.
Sujith Ravi and Kevin Knight. 2011. Bayesian inference for zodiac and other homophonic ciphers. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1, HLT ’11, pages 239-247, Stroudsburg, PA, USA. Association for Computational Linguistics.
Claude Elwood Shannon. 1948. A mathematical theory of communication. The Bell System Technical Journal, 27(3):379-423, 7.
The Wikimedia Foundation, 2018a. Wikipedia, the free encyclopedia. https://wikipedia.org/
The Wikimedia Foundation, 2018b. Wikisource, the free library. https://en.wikisource.org/wiki/Author:Zodiac_Killer#Letters.
Zodiackillerciphers.com, 2012. Annotated solution to the 408 cipher, based on the Harden worksheets. http://zodiackillerciphers.corn/408/key.html#1.