Conference article

Integrating large-scale web data and curated corpus data in a search engine supporting German literacy education

Sabrina Dittrich
Department of Linguistics, University of Tübingen, Germany

Zarah Weiss
Department of Linguistics, University of Tübingen, Germany

Hannes Schröter
German Institute for Adult Education – Leibniz Centre for Lifelong Learning, Germany

Detmar Meurers
Department of Linguistics, University of Tübingen, Germany / LEAD Graduate School and Research Network, University of Tübingen, Germany

Download article

Published in: Proceedings of the 8th Workshop on Natural Language Processing for Computer Assisted Language Learning (NLP4CALL 2019), September 30, Turku Finland

Linköping Electronic Conference Proceedings 164:5, p. 41-56

NEALT Proceedings Series 39:5, p. 41-56

Show more +

Published: 2019-09-30

ISBN: 978-91-7929-998-9

ISSN: 1650-3686 (print), 1650-3740 (online)


Reading material that is of interest and at the right level for learners is an essential component of effective language education. The web has long been identified as a valuable source of reading material due to the abundance and variability of materials it offers and its broad range of attractive and current topics. Yet, the web as source of reading material can be problematic in low literacy contexts. We present ongoing work on a hybrid approach to text retrieval that combines the strengths of web search with retrieval from a high-quality, curated corpus resource. Our system, KANSAS Suche 2.0, supports retrieval and reranking based on criteria relevant for language learning in three different search modes: unrestricted web search, filtered web search, and corpus search. We demonstrate their complementary strengths and weaknesses with regard to coverage, readability, and suitability of the retrieved material for adult literacy and basic education. We show that their combination results in a very versatile and suitable text retrieval approach for education in the language arts.


information retrieval, low literacy, readability assessment, German


No references available

Citations in Crossref