Conference article

Quantitative Comparative Syntax on the Cantonese-Mandarin Parallel Dependency Treebank

Tak-sum Wong
City University of Hong Kong

Kim Gerdes
Sorbonne Nouvelle, LPP (CNRS), Paris, France

Herman Leung
City University of Hong Kong

John Lee
City University of Hong Kong

Download article

Published in: Proceedings of the Fourth International Conference on Dependency Linguistics (Depling 2017), September 18-20, 2017, Università di Pisa, Italy

Linköping Electronic Conference Proceedings 139:30, s. 266-275

Show more +

Published: 2017-09-13

ISBN: 978-91-7685-467-9

ISSN: 1650-3686 (print), 1650-3740 (online)


This paper describes a new Cantonese-Mandarin parallel dependency treebank. We discuss the extent to which the treebank allows for comparative measures with the goal of quantifying structural differences between the two languages. After presenting syntactic differences between the two languages, we computed various frequency measures on the treebank. We present the results and discuss whether they reflect differences in text genre, differences in annotation scheme design, or actual structural differences. Finally, we compare the structural differences to previous accounts of the observed construction.


No keywords available


Chen, Xinying, and Kim Gerdes. “Classifying Languages by Dependency Structure: Typologies of Delexicalized Universal Dependency Treebanks”, Depling, 2017

David C. S. LI, Cathy S. P. WONG, Wai Mun LEUNG and Sam T. S. WONG. “Facilitation of Transference: The Case of Monosyllabic Salience in Hong Kong Cantonese” Linguistics, Vol. 54(1), pp. 1-58, January 2016.

Francis, Elaine J., and Stephen Matthews. “Categoriality and object extraction in Cantonese serial verb constructions.” Natural Language & Linguistic Theory 24.3 (2006): 751-801.

Gerdes, Kim. “Collaborative Dependency Annotation.” Depling, 2013. Gerdes, Kim, and Sylvain Kahane. “Dependency Annotation Choices: Assessing Theoretical and Practical Issues of Universal Dependencies.” LAW X (2016) The 10th Linguistic Annotation Workshop: 131. 2016.

Law SP, Kong APH, Lee A, Lai CT, Lam VVV. 2012. “Cantonese Chinese corpus of oral narratives (CANON) with morphological tagging: a preliminary report.” Presented in the Workshop on Innovations in Cantonese Linguistics (WICL), Columbus, OH., 16-17 March 2012.

Lebart, Ludovic, Andre Salem, and Lisette Berry. “Recent developments in the statistical processing of textual data.” Applied Stochastic Models and Data Analysis 7.1 (1991): 47-62.

Leung, Herman, Rafael Poiret, Tak sum Wong, Xinying Chen, Kim Gerdes, and John Lee “Developing Universal Dependencies for Mandarin Chinese.” The 12th Workshop on Asian Language Resources. 2016.

Lee, John. Toward a Parallel Corpus of Spoken Cantonese and Written Chinese. In Proc. 5th International Joint Conference on Natural Language Processing (IJCNLP), 2011.

Lee, Thomas H. T. and Colleen Wong. 1998. CANCORP: the Hong Kong Cantonese Child Language Corpus. Cahiers de Linguistique Asie Orientale vol. 27, no. 2, pp. 211-228.

Liu, Haitao. “Dependency direction as a means of word-order typology: A method based on dependency treebanks.” Lingua, 120.6 (2010): 1567-1578.

Luke, Kang-Kwong, & Wong, May L-Y. 2015. The Hong Kong Cantonese Corpus: design and uses. Journal of Chinese Linguistics 25 (2015): 309-330.

Matthews, Stephen and Virginia Yip. (2011) Cantonese: A comprehensive grammar. New York: Routledge.

de Marneffe, Marie-Catherine, Timothy Dozat, Natalia Silveira, Katri Haverinen, Filip Ginter, Joakim Nivre, and Christopher D. Manning. 2014. Universal Stanford Dependencies: A cross-linguistic typology. Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC 2014): 4584-4592.

Nivre, Joakim. “Towards a Universal Grammar for Natural Language Processing.” CICLing (1) 2015 (2015): 3-16.

Nivre, Joakim, Marie-Catherine de Marneffe, Filip Ginter, Yoav Goldberg, Jan Hajic?, Christopher D. Manning, Ryan McDonald, Slav Petrov, Sampo Pyysalo, Natalia Silveira, Reut Tsarfaty, and Daniel Zeman. 2016a. Universal Dependencies v1: A Multilingual Treebank Collection. Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016): 1659-1666.

Osborne, Timothy. “Diagnostics for Constituents: Dependency, Constituency, and the Status of Function Words.” Depling, 2015.

Ouyang, Jueya. (1993) ???????????? Putonghua Guangzhouhua de bijiao yu xuexi (The comparison and learning of Mandarin and Cantonese). Peking: China Social Science Press.

Saunders, Peter T. An introduction to catastrophe theory. Cambridge University Press, 1980.

Yip, Virginia and Stephen Matthews. (2000) Syntactic transfer in a bilingual child. Bilingualism: Language and Cognition 3.3, 193-208.

Yiu Yuk Man. Early Cantonese Tagged Database, presented at the Workshop on Early Cantonese Grammar, Dec 14 2014, Hong Kong: HKUST.

Citations in Crossref