Communicative efficiency and syntactic predictability: A cross-linguistic study based on the Universal Dependencies corpora

Natalia Levshina
Leipzig University, Leipzig, Germany

Ingår i: Proceedings of the NoDaLiDa 2017 Workshop on Universal Dependencies, 22 May, Gothenburg Sweden

Linköping Electronic Conference Proceedings 135:9, s. 72-78

NEALT Proceedings Series 31:9, s. 72-78

Publicerad: 2017-05-29

ISBN: 978-91-7685-501-0

ISSN: 1650-3686 (tryckt), 1650-3740 (online)


There is ample evidence that human communication is organized efficiently: more predictable information is usually encoded by shorter linguistic forms and less predictable information is represented by longer forms. The present study, which is based on the Universal Dependencies corpora, investigates if the length of words can be predicted from the average syntactic information content, which is defined as the average information content of a word given its counterpart in a dyadic syntactic relationship. The effect of this variable is tested on the data from nine typologically diverse languages while controlling for a number of other well-known parameters: word frequency and average word predictability based on the preceding and following words. Poisson generalized linear models and conditional random forests show that the words with higher average syntactic informativity are usually longer in most languages, although this effect is often found in interactions with average information content based on the neighbouring words. The results of this study demonstrate that syntactic predictability should be considered as a separate factor in future work on communicative efficiency.


