Communicative efficiency and syntactic predictability: A cross-linguistic study based on the Universal Dependencies corpora

Natalia Levshina
Leipzig University, Leipzig, Germany

Ladda ner artikel

Ingår i: Proceedings of the NoDaLiDa 2017 Workshop on Universal Dependencies, 22 May, Gothenburg Sweden

Linköping Electronic Conference Proceedings 135:9, s. 72-78

NEALT Proceedings Series 31:9, s. 72-78

Visa mer +

Publicerad: 2017-05-29

ISBN: 978-91-7685-501-0

ISSN: 1650-3686 (tryckt), 1650-3740 (online)


There is ample evidence that human communication is organized efficiently: more predictable information is usually encoded by shorter linguistic forms and less predictable information is represented by longer forms. The present study, which is based on the Universal Dependencies corpora, investigates if the length of words can be predicted from the average syntactic information content, which is defined as the average information content of a word given its counterpart in a dyadic syntactic relationship. The effect of this variable is tested on the data from nine typologically diverse languages while controlling for a number of other well-known parameters: word frequency and average word predictability based on the preceding and following words. Poisson generalized linear models and conditional random forests show that the words with higher average syntactic informativity are usually longer in most languages, although this effect is often found in interactions with average information content based on the neighbouring words. The results of this study demonstrate that syntactic predictability should be considered as a separate factor in future work on communicative efficiency.


Inga nyckelord är tillgängliga


Matthew Aylett and Alice Turk. 2004. The smooth signal redundancy hypothesis: A functional explanation for relationships between redundancy, prosodic prominence, and duration in spontaneous speech. Language and Speech, 47(1):31-56.

Alan Bell, Jason Brenier, Michelle Gregory, Cynthia Girand and Dan Jurafsky. 2009. Predictability Effects on Durations of Content and Function Words in Conversational English. Journal of Memory and Language, 60(1): 92-111.

Christian Bentz and Ramon Ferrer-i-Cancho. 2016. Zipf’s law of abbreviation as a language universal. In Bentz, Christian, Gerhard Jäger and Igor Yanovich (eds.), Proceedings of the Leiden Workshop on Capturing Phylogenetic Algorithms for Linguistics. University of Tubingen, online publication system: https://publikationen.unituebingen.de/xmlui/handle/10900/68558.

Patrick Breheny and Woodrow Burchett. 2016. visreg: Visualization of Regression Models. R
package version 2.3-0. https://CRAN.Rproject.org/package=visreg.

John Fox and Sanford Weisberg. 2011. An R Companion to Applied Regression. 2nd ed. Thousand Oaks, CA: Sage, http://socserv.socsci.mcmaster.ca/jfox/Books/Companion.

Joseph Greenberg. 1966. Language universals, with special reference to feature hierarchies. The Hague: Mouton.

Martin Haspelmath. 2008. Frequencies vs. iconicity in explaining grammatical asymmetries. Cognitive
Linguistics, 19(1): 1–33.

John A. Hawkins. 2014. Cross-linguistic Variation and Efficiency. Oxford: OUP.

Roger Levy and T. Florian Jaeger. 2007. Speakers optimize information density through syntactic reduction. In Bernhard Schlökopf, John Platt & Thomas Hoffman (eds.), Advances in neural information processing systems (NIPS) Vol. 19, 849–856. Cambridge, MA: MIT Press.

Joakim Nivre, Željko Agic, Lars Ahrenberg et al. 2017. Universal Dependencies 2.0, LINDAT/CLARIN digital library at the Institute of Formal and Applied Linguistics, Charles University in Prague, http://hdl.handle.net/11234/1-1983.

Steven T. Piantadosi, Harry Tily and Edward Gibson. 2011. Word lengths are optimized for efficient communication. PNAS, 108(9). http://www.pnas.org/cgi/doi/10.1073/pnas.1012551108

R Core Team. 2016. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria, https://www.R-project.org/

Claude E. Shannon. 1948. A Mathematical Theory of Communication, Bell System Technical Journal, 27: 379–423 & 623–656.

Carolin Strobl, Anne-Laure Boulesteix, Achim Zeileis & Torsten Hothorn. 2007. Bias in Random Forest Variable Importance Measures: Illustrations, Sources and a Solution. BMC Bioinformatics, 8, 25, http://www.biomedcentral.com/1471-2105/8/25.

George K. Zipf. 1935 [1968]. The Psycho-Biology of Language: An Introduction to Dynamic Philology. Cambridge, MA: MIT Press.

Citeringar i Crossref