Konferensartikel

Using Broad Linguistic Complexity Modeling for Cross-Lingual Readability Assessment

Zarah Weiss

Xiaobin Chen

Detmar Meurers

Ladda ner artikel

Ingår i: Proceedings of the 10th Workshop on Natural Language Processing for Computer Assisted Language Learning (NLP4CALL 2021)

Linköping Electronic Conference Proceedings 177:4, s. 38-54

Visa mer +

Publicerad: 2021-05-21

ISBN: 978-91-7929-625-4

ISSN: 1650-3686 (tryckt), 1650-3740 (online)

Abstract

We investigate the readability classification of English and German reading materials for language learners based on a broad linguistic complexity feature set supporting the parallel analysis of both German and English. After illustrating the quality of the feature set by showing that it yields state-of-the-art classification performance for the established OneStopEnglish corpus (Vajjala & Lucic, 2018), we introduce the Spotlight corpus. This new data set contains graded reading materials produced by the same publisher for English and German, which supports an analysis comparing the linguistic characteristics of texts at different reading levels across languages. As far as we are aware, this is both the first readability corpus for German L2 learners, as well as the first corpus with comparably classified reading material for learners across multiple languages. After discussing the first results for a readability classifier for German L2 learners, we show that the linguistic complexity analyses for the cross-language experiments identify features successfully characterizing the readability of texts for language learners across languages, as well as some language-specific characteristics of different reading levels.

Nyckelord

readability assessment, cross-lingual complexity analysis, foreign language learning

Referenser

Inga referenser tillgängliga

Citeringar i Crossref