Validating Bundled Gap Filling -- Empirical Evidence for Ambiguity Reduction and Language Proficiency Testing Capabilities

Niklas Meyer
Language Technology Lab, University of Duisburg Essen, Duisburg, Germany

Michael Wojatzki
Language Technology Lab, University of Duisburg Essen, Duisburg, Germany

Torsten Zesch
Language Technology Lab, University of Duisburg Essen, Duisburg, Germany

Ladda ner artikel

Ingår i: Proceedings of the joint workshop on NLP for Computer Assisted Language Learning and NLP for Language Acquisition at SLTC, Umeå, 16th November 2016

Linköping Electronic Conference Proceedings 130:7, s. 51-59

Visa mer +

Publicerad: 2016-11-15

ISBN: 978-91-7685-633-8

ISSN: 1650-3686 (tryckt), 1650-3740 (online)


Bundled gap filling exercises (Wojatzki et al., 2016) were recently introduced as a promising new exercise type to complement or even replace single gap-fill tasks. However, it is not yet confirmed that the applied creation method works properly and it is still to be investigated if bundled gap-fill tests are a suitable method for assessing language proficiency. In this paper, we address both issues by varying the construction methods and by conducting a user study with 75 participants in which we also measure externally validated language proficiency. We find that the originally proposed way to construct bundles is indeed minimizing their ambiguity, but that further investigation is needed to determine which aspects of language proficiency they are actually measuring.


Gap-filling, language proficiency testing, NLP


Roberta G. Abraham and Carol A. Chapelle. 1992. The meaning of cloze test scores: An item difficulty perspective. The Modern Language Journal, 76(4):468–479.

Lyle F Bachman. 1982. The trait structure of cloze test scores. Tesol Quarterly, pages 61–70.

Marco Baroni, Silvia Bernardini, Adriano Ferraresi, and Eros Zanchetta. 2009. The {WaCky} wide web: a collection of very large linguistically processed webcrawled corpora. Language Resources and Evaluation, 43(3):209–226.

James Dean Brown. 1980. Relative merits of four methods for scoring cloze tests. The Modern Language Journal, 64(3):311–317.

James D. Brown. 1989. Cloze Item Difficulty. JALT Journal, 11(1):46–67.

Mary Anne Chavez-Oller, Tetsuro Chihara, Kelley A. Weaver, and John W. Oller. 1985. When are cloze items sensitive to constraints across sentences? Language Learning, 35(2):181–206.

Stanley F. Chen and Joshua Goodman. 1999. An Empirical Study of Smoothing Techniques for Language Modeling. Computer Speech & Language, 13(4):359–393.

Tetsuro Chihara, John Oller, Kelley Weaver, and Mary Anne Chavez-Oller. 1977. Are cloze items sensitive to constraints across sentences? Language learning, 27(1):63–70.

Donald K Darnell. 1968. The development of an English language proficiency test of foreign students, using a clozentropy procedure. final report.

Christine Klein-Braley and Ulrich Raatz. 1982. Der C-Test: ein neuer Ansatz zur Messung allgemeiner Sprachbeherrschung. AKS-Rundbrief, 4:23–37.

Miyoko Kobayashi. 2002. Cloze tests revisited: Exploring item characteristics with special attention to scoring methods. The Modern Language Journal, 86(4):571–586.

Henry Scheffé. 1953. A method for judging all contrasts in the analysis of variance. Biometrika, 40(1-2):87–110.

Wilson L. Taylor. 1953. ”Cloze Procedure”: A New Tool For Measuring Readability. Journalism Quarterly, 30(4):415–433.

John W Tukey. 1949. Comparing individual means in the analysis of variance. Biometrics, pages 99–114.

Marjorie Wesche and Sima T. Paribakht. 1994. Enhancing Vocabulary Acquisition through Reading: A Hierarchy of Text-Related Exercise Types. Paper presented at the AAAL ’94 Conference.

Michael Wojatzki, Oren Melamud, and Torsten Zesch. 2016. Bundled gap filling: A new paradigm for unambiguous cloze exercises. In Proceedings of the 11th Workshop on Innovative Use of NLP for Building Educational Applications, pages 172–181, San Diego, CA, June. Association for Computational Linguistics.

Deniz Yuret. 2012. FASTSUBS: An efficient and exact procedure for finding the most likely lexical substitutes based on an n-gram language model. Signal Processing Letters, IEEE, 19(11):725–728.

Amir Zeldes. 2016. The GUM Corpus: Creating Multilayer Resources in the Classroom. Language  Resources and Evaluation, pages 1–32.

Citeringar i Crossref