Conference article

Towards Automatic Scoring of Cloze Items by Selecting Low-Ambiguity Contexts

Tobias Horsmann
Language Technology Lab, University of Duisburg-Essen, Germany

Torsten Zesch
Language Technology Lab, University of Duisburg-Essen, Germany

Download article

Published in: Proceedings of the third workshop on NLP for computer-assisted language learning at SLTC 2014, Uppsala University

Linköping Electronic Conference Proceedings 107:3, p. 33–42

NEALT Proceedings Series 22:3, p. 33–42

Show more +

Published: 2014-11-11

ISBN: 978-91-7519-175-1

ISSN: 1650-3686 (print), 1650-3740 (online)


In second language learning, cloze tests (also known as fill-in-the-blank tests) are frequently used for assessing the learning progress of students. While preparation effort for these tests is low, scoring needs to be done manually, as there usually is a huge number of correct solutions. In this paper, we examine whether the ambiguity of cloze items can be lowered to a point where automatic scoring becomes possible. We utilize the local context of a word to collect evidence of low-ambiguity. We do that by seeking for collocated word sequences, but also taking structural information on sentence level into account. We evaluate the effectiveness of our method in a user study on cloze items ranked by our method. For the top-ranked items (lowest ambiguity) the subjects provide the target word significantly more often than for the bottom-ranked items (59.9% vs. 36.5%). While this shows the potential of our method, we did not succeed in fully eliminating ambiguity. Thus, further research is necessary before fully automatic scoring becomes possible.


cloze tests; language proficiency tests; automatic scoring


Aliprandi, C., Carmignani, N., and Mancarella, P. (2007). In International Journal of Computing and Information Sciences, volume 5, pages 79–85.

Arranz, V., Atserias, J., and Castillo, M. (2005). Multiwords and word sense disambiguation. In Gelbukh, A., editor, Computational Linguistics and Intelligent Text Processing, volume 3406 of Lecture Notes in Computer Science, pages 250–262. Springer Berlin Heidelberg.

Baroni, M., Bernardini, S., Ferraresi, A., and Zanchetta, E. (2009). The wacky wide web: A collection of very large linguistically processed web-crawled corpora. Language Resources and Evaluation, 3:209–226.

Biemann, C. and Riedl, M. (2013). Text: now in 2d! a framework for lexical expansion with contextual similarity. Journal of Language Modelling, 1:55–95.

Brants, T. and Franz, A. (2006). Web 1t 5-gram corpus version 1.1. Linguistic Data Consortium. Chen, S., Beeferman, D., and Rosenfeld, R. (1998). Evaluation Metrics for Language Models. In DARPA Broadcast News Transcription and Understanding Workshop (BNTUW), Lansdowne, Virginia, USA.

Church, K.W. and Hanks, P. (1990). Word association norms, mutual information, and lexicography. Comput. Linguist., 16(1):22–29.

Goldberg, Y. and Orwant, J. (2013). A dataset of syntactic-ngrams over time from a very large corpus of english books.

Gonzáles, A. B. (1996). Testing english as a foreign language: an overview and some methodological considerations. Revista española de lingüística aplicada, 11:71–94.

Hartmann, S., Szarvas, G., and Gurevych, I. (2012). Mining multiword terms from wikipedia. In Pazienza, M. T. and Stellato, A., editors, Semi-Automatic Ontology Development: Processes and Resources, pages 226–258. IGI Global, Hershey, PA, USA.

Hearst, M. A. (1992). Automatic acquisition of hyponyms from large text corpora. In In Proceedings of the 14th International Conference on Computational Linguistics, pages 539–545.

Kincaid, P. J., Fishburne, R. P. J., Rogers, R. L., and Chissom, B. S. (1975). Derivation of new readability formulas (automated readability index, fog cound and flesch reading ease formula) for navy enlisted personnel. Naval Technical Training Command: Research Branch Report 8-75.

Klein-Braley, C. and Raatz, U. (1982). Der c-test: ein neuer ansatz zur messung von allgemeiner sprachbeherrschung. AKS-Rundbrief, pages 23–37.

Korkontzelos, I., Klapaftis, I., and Manandhar, S. (2008). Reviewing and evaluating automatic term recognition techniques. In Nordström, B. and Ranta, A., editors, Advances in Natural Language Processing, volume 5221 of Lecture Notes in Computer Science, pages 248–259. Springer Berlin Heidelberg.

Lee, J. and Seneff, S. (2007). Automatic generation of cloze items for prepositions. Interspeech. Li, J. and Hirst, G. (2005). Semantic knowledge in word completion. In Proceedings of the 7th international ACM SIGACCESS conference on Computers and accessibility, Assets ’05, pages 121–128, New York, NY, USA. ACM.

O’Toole, J. M. and King, R. A. R. (2011). The deceptive mean: Conceptual scoring of cloze entries differentially advantages more able readers. Language Testing.

Sakaguchi, K., Arase, Y., and Komachi, M. (2013). Discriminative approach to fill-in-the-blank quiz generation for language learners. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, pages 238–242.

Smith, S. and Avinesh, P. (2010). Gap-fill tests for language learners: Corpus-driven item generation. In Proceedings of ICON-2010: 8th International Conference on Natural Language Processing.

Sumita, E., Sugaya, F., and Yamamoto, S. (2005). Measuring non-native speakers’ proficiency of english by using a test with automatically-generated fill-in-the-blank questions. In Proceedings of the second workshop on Building Educational Applications Using NLP, EdAppsNLP 05, pages 61–68, Stroudsburg, PA, USA. Association for Computational Linguistics.

Taylor, W. (1953). Cloze procedure: A new tool for measuring readability. Journalism Quarterly, 30:415–433.

Taylor, W. (1956). Recent developments in the use of cloze procedure. Journalism Quarterly, 33:42.

Trnka, K. (2008). Adaptive language modeling for word prediction. In Proceedings of the ACL-08: HLT Student Research Workshop, pages 61–66, Columbus, Ohio. Association for Computational Linguistics.

Zesch, T. and Melamud, O. (2014). Automatic Generation of Challenging Distractors Using Context-Sensitive Inference Rules. In Proceedings of the 9th Workshop on Innovative Use of NLP for Building Educational Applications at ACL, Baltimore, USA.

Citations in Crossref