Conference article

A Linguistically-Informed Search Engine to Identifiy Reading Material for Functional Illiteracy Classes

Zarah Weiss
Department of Linguistics, Group, LEAD Graduate School & Research Network, University of Tübingen, Germany

Sabrina Dittrich
Department of Linguistics, Group, LEAD Graduate School & Research Network, University of Tübingen, Germany

Detmar Meurers
Department of Linguistics, Group, LEAD Graduate School & Research Network, University of Tübingen, Germany

Download article

Published in: Proceedings of the 7th Workshop on NLP for Computer Assisted Language Learning (NLP4CALL 2018) at SLTC, Stockholm, 7th November 2018

Linköping Electronic Conference Proceedings 152:9, p. 79-90

NEALT Proceedings Series 36:9, p. 79-90

Show more +

Published: 2018-11-02

ISBN: 978-91-7685-173-9

ISSN: 1650-3686 (print), 1650-3740 (online)


We present KANSAS, a search engine designed to retrieve reading materials for functional illiterates and learners of German as a Second Language. The system allows teachers to refine their searches for teaching material by selecting appropriate readability levels and (de)prioritizing linguistic constructions. In addition to this linguistically-informed query result ranking, the system provides visual input enhancement for the selected linguistic constructions.

Our system combines state-of-the-art Natural Language Processing (NLP) with light-weight algorithms for the identification of relevant linguistic constructions. We have evaluated the system in two pilot studies in terms of the identification of linguistic constructions and the identification of readability levels. Both pilots achieved highly promising results and are being followed by full-fledged performance studies and usability tests.


information retrieval, functional illiteracy, German, German as a Second Language


Rebekah George Benjamin. 2012. Reconstructing readability: Recent developments and recommendations in the analysis of text difficulty. Educational Psychology Review, 24:63–88.

Jonathan Brown and Maxine Eskenazi. 2004. Retrieval of authentic documents for reader-specific lexical practice. In InSTIL/ICALL Symposium 2004. Tim Vor der Brück, Sven Hartrumpf, and Hermann Helbig. 2008. A readability checker with supervised learning using deep syntactic and semantic indicators. Informatica, 32(4):429–435.

Jeanne S. Chall and Edgar Dale. 1995. Readability revisited: the new Dale-Chall Readability Formula. Brookline Books.

Xiaobin Chen and Detmar Meurers. 2017. Word frequency and readability: Predicting the text-level readability with a lexical-level attribute. Journal of Research in Reading, 41(3):486–510.

Maria Chinkina, Madeeswaran Kannan, and Detmar Meurers. 2016. Online information retrieval for language learning. In Proceedings of ACL-2016 System Demonstrations, pages 7–12, Berlin, Germany. Association for Computational Linguistics.

Jacob Cohen. 1960. A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20(1):37–46.

Jacob Cohen. 1968. Weighted kappa: Nominal scale agreement provision for scaled disagreement or partial credit. Psychological Bulletin, 70(4):213–220.

Scott A. Crossley, David B. Allen, and Danielle Mc-Namara. 2011. Text readability and intuitive simplification: A comparison of readability formulas. Reading in a Foreign Language, 23(1):84–101.

Scott A. Crossley, Jerry Greenfield, and Danielle S. Mcnamara. 2008. Assessing text readability using cognitively based indices. TESOL Quarterly, 42(3):475–493.

Felice Dell’Orletta, Simonetta Montemagni, and Giulia Venturi. 2011. Read-it: Assessing readability of Italian texts with a view to text simplification. In Proceedings of the 2nd Workshop on Speech and Language Processing for Assistive Technologies, pages 73–83.

William H. DuBay. 2006. The Classic Readability Studies. Impact Information, Costa Mesa, California. Sukru Eraslan, Victoria Yaneva, and Yeliz Yelisada. 2017. Do web users with autism experience barriers when searching for information within web pages? In Proceedings of the 14th Web for All Conference on The Future of Accessible Work, pages 20–23. ACM.

B. Janghorban Esfahani, A. Faron, K. S. Roth, P. P. Grimminger, and J. C. Luers. 2016. Systematic readability analysis of medical texts on websites of german university clinics for general and abdominal surgery. Zentralblatt für Chirurgie, 141(6):639–644.

Lijun Feng. 2009. Automatic readability assessment for people with intellectual disabilities. In ACM SIGACCESS accessibility and computing, volume 93, pages 84–91.

Lijun Feng, No´emie Elhadad, and Matt Huenerfauth. 2009. Cognitively motivated features for readability assessment. In Proceedings of the 12th Conference of the European Chapter of the ACL (EACL 2009), pages 229–237, Athens, Greece. Association for Computational Linguistics.

Lijun Feng, Martin Jansche, Matt Huenerfauth, and Noémie Elhadad. 2010. A comparison of features for automatic readability assessment. Proceedings of the 23rd International Conference on Computational Linguistics, pages 276–284.

Thomas François and Cedrick Fairon. 2012. An “AI readability” formula for French as a foreign language. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning.

Geert Freyhoff, Gerhard Hess, Linda Kerr, Elizabeth Menzell, Bror Tronbacke, and Kathy Van Der Veken. 1998. Make It Simple, European Guidelines for the Production of Easy-to-Read Information for People with Learning Disability for authors, editors, information providers, translators and other interested persons. International League of Societies for Persons with Mental Handicap European Association, Brussels.

Mark G. Friedman and Diane Nelson Bryen. 2007. Web accessibility design recommendations for people with cognitive disabilities. Technology and Disability, 19(4):205–212.

Silke Gausche, Anne Haase, and Diana Zimper. 2014. Lesen. DVV-Rahmencurriculum, 1 edition. Deutscher Volkshochschul-Verband e.V., Bonn.

Petronella Grootens-Wiegers, Martine C. De Vries, Tessa E. Vossen, and Jos M. Van den Broek. 2015. Readability and visuals in medical research information forms for children and adolescents. Science Communication, 37(1):89–117.

Kevin A. Hallgren. 2012. Computing inter-rater reliability for observational data: An overview and tutorial. Tutorials in quantitative methods for psychology, 8(1):23–34.

Julia Hancke, Detmar Meurers, and Sowmya Vajjala. 2012. Readability classification for German using lexical, syntactic, and morphological features. In Proceedings of the 24th International Conference on Computational Linguistics (COLING), pages 1063–1080, Mumbay, India.

Matt Huenerfauth, Lijun Feng, and Noémie Elhadad. 2009. Comparing evaluation techniques for text readability software for adults with intellectual disabilities. In Proceedings of the 11th international ACM SIGACCESS conference on Computers and accessibility, Assets ’09, pages 3–10, New York, NY, USA. ACM.

J. Peter Kincaid, Robert P. Fishburne, Richard L. Rogers, and Brad S. Chissom. 1975. Derivation of new readability formulas (Automated Readability Index, Fog Count and Flesch Reading Ease formula) for Navy enlisted personnel. Research Branch Report 8-75, Naval Technical Training Command, Millington, TN.

Stephen Krashen. 1977. Some issues relating to the monitor model. On Tesol, 77(144-158).

Rudolf Kretschmann and Petra Wieken. 2010. Lesen. Alpha Levels. lea., Hamburg.

Roger Levy and Galen Andrew. 2006. Tregex and tsurgeon: tools for querying and manipulating tree data structures. In Proceedings of the fifth international conference on Language Resources and Evaluation, pages 2231–2234, Genoa, Italy. European Language Resources Association (ELRA).

Luís Marujo, Jos´e Lopes, Nuno Mamede, Isabel Trancoso, Juan Pino, Maxine Eskenazi, Jorge Baptista, and C´eu Viana. 2009. Porting reap to european portuguese. In International Workshop on Speech and Language Technology in Education.

Jacob E. McCarthy and Sarah J. Swierenga. 2010. What we know about dyslexia and web accessibility: a research review. Universal Access in the Information Society, 9(2):147–152.

Detmar Meurers, Ramon Ziai, Luiz Amaral, Adriane Boyd, Aleksandar Dimitrov, Vanessa Metcalf, and Niels Ott. 2010. Enhancing authentic web pages for language learners. In Proceedings of the 5th Workshop on Innovative Use of NLP for Building Educational Applications (BEA), pages 10–18, Los Angeles. ACL.

Eleni Miltsakaki and Audrey Troutt. 2007. Read-x: Automatic evaluation of reading difficulty of web text. In E-Learn: World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education, pages 7280–7286. Association for the Advancement of Computing in Education (AACE).

Misako Nomura, Gyda Skat Nielsen, and Bror Tronbacke. 2010. Guidelines for easy-to-read materials. revision on behalf of the ifla/library services to people with special needs section. IFLA Professional Reports 120, International Federation of Library Associations and Institutions, The Hague, IFLA Headquarters.

Niels Ott and Detmar Meurers. 2011. Information retrieval for education: Making search engines language aware. Themes in Science and Technology Education, 3(1-2):9–30.

Luz Rello, Ricardo Baeza-Yates, Laura Dempere-Marco, and Horacio Saggion. 2013a. Frequent words improve readability and short words improve understandability for people with dyslexia. In IFIP Conference on Human-Computer Interaction, pages 203–219, Berlin, Heidelberg. Springer.

Luz Rello, Susana Bautista, Ricardo Baeza-Yates, Pablo Gervás, Raquel Hervás, and Horacio Saggion. 2013b. One half or 50%? an eye-tracking study of number representation readability. In IFIP Conference on Human-Computer Interaction, pages 229–245, Berlin, Heidelberg. Springer.

Wibke Riekmann and Anke Grotl¨uschen. 2011. Konservative Entscheidungen: Gr¨oßenordnung des funktionalen Analphabetismus in Deutschland. REPORT - Zeitschrift f¨ur Weiterbildungsforschung, 3:24–35.

Stephen E Robertson and Steve Walker. 1994. Some simple effective approximations to the 2-poisson model for probabilistic weighted retrieval. In Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval, pages 232–241. Springer-Verlag New York, Inc.

Richard W. Schmidt. 1990. The role of consciousness in second language learning. Applied Linguistics, 11:206–226.

Michael Sharwood Smith. 1993. Input enhancement in instructed SLA. Studies in Second Language Acquisition, 15(2):165–179.

Merrill Swain. 1985. Communicative competence: Some roles of comprehensible input and comprehensible output in its development. In Susan M. Gass and Carolyn G. Madden, editors, Input in second language acquisition, pages 235–253. Newbury House, Rowley, MA.

Sowmya Vajjala and Detmar Meurers. 2012. On improving the accuracy of readability classification using insights from second language acquisition. In Proceedings of the 7th Workshop on Innovative Use of NLP for Building Educational Applications (BEA), pages 163–173, Montréal, Canada. ACL.

Sowmya Vajjala and Detmar Meurers. 2013. On the applicability of readability models to web texts. In Proceedings of the Second Workshop on Predicting and Improving Text Readability for Target Reader Populations, pages 59–68.

Zarah Weiss and Theresa Geppert. 2018. Textlesbarkeit für Alpha-Levels. Annotationsrichtlinien für Lesetexte., Bonn, T¨ubingen.

Zarah Weiss and Detmar Meurers. 2018. Modeling the readability of German targeting adults and children: An empirically broad analysis and its cross-corpus validation. In Proceedings of the 27th International Conference on Computational Linguistics (Coling 2018), Santa Fe, New Mexico, USA. International Committee on Computational Linguistic.

Angelika Wöllstein. 2014. Topologisches Satzmodell, 2 edition. Winter, Heidelberg.

Victoria Yaneva. 2015. Easy-read documents as a gold standard for evaluation of text simplification output. In Proceedings of the Student Research Workshop, pages 30–36.

Victoria Yaneva, Irina Temnikova, and Ruslan Mitkov. 2015. Accessible texts for autism: An eye-tracking study. In Proceedings of the 17th International ACM SIGACCESS Conference on Computers & Accessibility, pages 49–57. ACM.

Victoria Yaneva, Irina Temnikova, and Ruslan Mitkov. 2016. Evaluating the readability of text simplification output for readers with cognitive disabilities. In Proceedings of the 10h International Conference on Language Resources and Evaluation, pages 293–299.

Citations in Crossref