Conference article

Using the Entropy of N-Grams to Evaluate the Authenticity of Substitution Ciphers and Z340 in Particular

Tom S. Juzek
Saarland University, Campus A2.2, Rl.25, 66123 Saarbrilcken, Germany

Download article

Published in: Proceedings of the 2nd International Conference on Historical Cryptology, HistoCrypt 2019, June 23-26, 2019, Mons, Belgium

Linköping Electronic Conference Proceedings 158:13, p. 117-125

NEALT Proceedings Series 37:13, p. 117-125

Show more +

Published: 2019-06-12

ISBN: 978-91-7685-087-9

ISSN: 1650-3686 (print), 1650-3740 (online)


The present paper uses information theoretic entropy as a means to evaluate the authenticity of homophonic substitution ciphers. We motivate the use of entropy on n-grams and then validate its applicability, by using it on various true ciphers and pseudo-ciphers. Differences in entropy allow us to apply further formal analyses, e.g. support-vector machines, in order to make predictions about a potential cipher’s status. We train several support-vector machines and validate them. We then apply the models to two classic ciphers, the Zodiac Killer’s first major cipher, z408, which has been solved, and his second cipher, z340, which remains unsolved. The models correctly identify z408 as a substitution cipher. z340 is classified as an advanced cipher or pseudo-cipher.


cryptography cryptology substitution ciphers entropy n-gram-based analysis support-vector machines z340


Asa Ben-Hur, David Hom, Hava T. Siegelmann, and Vladimir Vapnik. 2002. Support vector clustering. Journal of Machine Learning Research, 2: 125-137.

Eric Corlett and Gerald Penn. 2010. An exact a* method for deciphering letter-substitution ciphers. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, ACL ’10, pages 1040---1047, Stroudsburg, PA, USA. Association for Computational Linguistics.

Edward W. Forgy. 1965. Cluster analysis of multivariate data: Efficiency versus interpretability of classification. Biometrics, 21(3):768-769.

Robert Graysmith. 1987. Zodiac. Berkley, New York City, NY.

Jeffrey Hoffstein, Jill Pipher, and Joseph H. Silverman. 2008. An Introduction to Mathematical Cryptography. Springer Publishing Company, Incorporated, 1 edition.

Dan Jurafsky, 2019. Lecture CS 124: From Languages to Information. Introduction to N-grams lecture notes.

Kevin Knight, 2013. Decipherment Tutorial. Workshop at the 20!3 Annual Meeting of the Association for Computational Linguistics - lecture notes.

Stuart P. Lloyd. 1982. Least squares quantization in pcm. IEEE Transactions on Information Theory, 28:129-137.

Project Gutenberg, 2018. Project Gutenberg.

Python Software Foundation, 2018. Python: A dynamic, open source programming language.

Sujith Ravi and Kevin Knight. 2008. Attacking decipherment problems optimally with low-order ngram models. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP ’08, pages 812-819, Stroudsburg, PA, USA. Association for Computational Linguistics.

Sujith Ravi and Kevin Knight. 2011. Bayesian inference for zodiac and other homophonic ciphers. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1, HLT ’11, pages 239-247, Stroudsburg, PA, USA. Association for Computational Linguistics.

Claude Elwood Shannon. 1948. A mathematical theory of communication. The Bell System Technical Journal, 27(3):379-423, 7.

The Wikimedia Foundation, 2018a. Wikipedia, the free encyclopedia.

The Wikimedia Foundation, 2018b. Wikisource, the free library., 2012. Annotated solution to the 408 cipher, based on the Harden worksheets. http://zodiackillerciphers.corn/408/key.html#1.

Citations in Crossref