Conference article

Classifying Languages by Dependency Structure. Typologies of Delexicalized Universal Dependency Treebanks

Xinying Chen
School of International Studies, Xi’an Jiaotong University, China / Department of Czech Language, University of Ostrava, Czech Republic

Kim Gerdes
LPP (CNRS), Sorbonne Nouvelle, France

Download article

Published in: Proceedings of the Fourth International Conference on Dependency Linguistics (Depling 2017), September 18-20, 2017, Università di Pisa, Italy

Linköping Electronic Conference Proceedings 139:8, s. 54-63

Show more +

Published: 2017-09-13

ISBN: 978-91-7685-467-9

ISSN: 1650-3686 (print), 1650-3740 (online)

Abstract

This paper shows how the current Universal Dependency treebanks can be used for clustering structural global linguistic features of the treebanks to reveal a purely structural syntactic typology of languages. Different uni- and multi-dimensional data extraction methods are explored and tested in order to assess both the coherence of the underlying syntactic data and the quality of the clustering methods themselves.

Keywords

No keywords available

References

Abramov, Olga, and Alexander Mehler. “Automatic language classification by means of syntactic dependency networks.” Journal of Quantitative Linguistics, 18.4 (2011): 291-336.

Chen, Xinying, Haitao Liu, and Kim Gerdes. “Classifying Syntactic Categories in the Chinese Dependency Network.” Depling 2015 (2015): 74.

Croft, William. Typology and universals. Cambridge University Press, 2002.

De Marneffe, Marie-Catherine, et al. “Universal Stanford dependencies: A cross-linguistic typology.” LREC. Vol. 14. 2014.

Dryer, Matthew S. “The Greenbergian word order correlations.” Language, (1992): 81-138.

Ferrer-i-Cancho, Ramon, and Richard V. Sole. “The small world of human language.” Proceedings of the Royal Society of London B: Biological Sciences, 268.1482 (2001): 2261-2265.

Gerdes, Kim, and Sylvain Kahane. “Dependency Annotation Choices: Assessing Theoretical and Practical Issues of Universal Dependencies.” LAW X (2016)

Greenberg, Joseph H. “Some universals of grammar with particular reference to the order of meaningful elements.” Universals of language, 2 (1963): 73-113.

Haspelmath, Martin. The world atlas of language structures. Vol. 1. Oxford University Press, 2005.

Ledgeway, Adam. “Syntactic and morphosyntactic typology and change.” The Cambridge history of the Romance languages, 1 (2011): 382-471.

Liu, Haitao. “Dependency distance as a metric of language comprehension difficulty.” Journal of Cognitive Science, 9. 2 (2008): 159-191.

Liu, Haitao. “Dependency direction as a means of word-order typology: A method based on dependency treebanks.” Lingua, 120.6 (2010): 1567-1578.

Liu, Haitao, and Chunshan Xu. “Can syntactic networks indicate morphological complexity of a language?.” EPL (Europhysics Letters), 93.2 (2011): 28005.

Liu, Haitao, and Chunshan Xu. “Quantitative typological analysis of Romance languages.” Poznan Studies in Contemporary Linguistics PsiCL, 48 (2012): 597-625.

Liu, Haitao, and Jin Cong. “Language clustering with word co-occurrence networks based on parallel texts.” Chinese Science Bulletin, 58.10 (2013): 1139-1144.

Liu, Haitao, Richard Hudson, and Zhiwei Feng. “Using a Chinese treebank to measure dependency distance.” Corpus Linguistics and Linguistic Theory, 5.2 (2009): 161-174.

Liu, Haitao, and Wenwen Li. “Language clusters based on linguistic complex networks.” Chinese Science Bulletin, 55.30 (2010): 3458-3465.

Lucien Tesniere. 1959. Elements de syntaxe structurale. Klincksieck, Paris.

Petrov, Slav, Dipanjan Das, and Ryan McDonald. “A universal part-of-speech tagset.” arXiv preprint arXiv:1104.2086, (2011).

Sanguinetti M, Bosco C. “Building the multilingual TUT parallel treebank”. Proceedings of The Second Workshop on Annotation and Exploitation of Parallel Corpora 2011 Sep 15 (p. 19).

Citations in Crossref