The Role of Diacritics in Increasing the Difficulty of Arabic Lexical Recognition Tests

Osama Hamed
Language Technology Lab, University of Duisburg-Essen, Germany

Torsten Zesch
Language Technology Lab, University of Duisburg-Essen, Germany

Ingår i: Proceedings of the 7th Workshop on NLP for Computer Assisted Language Learning (NLP4CALL 2018) at SLTC, Stockholm, 7th November 2018

Linköping Electronic Conference Proceedings 152:3, s. 23-31

NEALT Proceedings Series 36:3, s. 23-31

Publicerad: 2018-11-02

ISBN: 978-91-7685-173-9

ISSN: 1650-3686 (tryckt), 1650-3740 (online)


Lexical recognition tests are widely used to assess the learners’ vocabulary size. We investigate the role that diacritics play in increasing the difficulty of an Arabic lexical recognition test. An NLP pipeline is implemented to reliably estimate the frequency of diacritized word forms. We conduct a user study and compare Arabic LRTs in three settings: one has no diacritics, and two are diacritized using the most frequent and least frequent diacritized form of a word. We find that the use of infrequent diacritics can better increase the difficulty of Arabic LRTs.


Lexical Recognition Tests, Arabic LRTs, Vocabulary Size, Diacritics, Frequency Counts, Test Difficulty/Generation


