Conference article

A Dependency Treebank for Kurmanji Kurdish

Memduh Gökirmak
Department of Computer Engineering, Istanbul Technical University, Turkey

Francis M. Tyers
School of Linguistics, Higher School of Economics, Russia

Download article

Published in: Proceedings of the Fourth International Conference on Dependency Linguistics (Depling 2017), September 18-20, 2017, Università di Pisa, Italy

Linköping Electronic Conference Proceedings 139:9, s. 64-72

Show more +

Published: 2017-09-13

ISBN: 978-91-7685-467-9

ISSN: 1650-3686 (print), 1650-3740 (online)

Abstract

This paper describes the development of the first syntactically annotated corpus of Kurmanji Kurdish. The corpus was used as one of the surprise languages in the 2017 CoNLL shared task on parsing Universal Dependencies. In the paper we describe how the corpus was prepared, some Kurmanji specific constructions that required special treatment, and we give results for parsing Kurdish using two popular data driven parsers.

Keywords

No keywords available

References

Halil Aktug. 2013. Gramera Kurdî – Kürtçe Gramer. Avesta Publishing.

Purya Aliabadi, Mohammad Sina Ahmadi, Shahin Salavati, and Kyumars Sheykh Esmaili. 2014. Towards building kurdnet, the kurdish wordnet. In Proceedings of the 7th Global WordNet Conference.

Celadet Bedirxan and Roger Lescot. 1990. Rêzimana Kurdî. Eckhard Bick and Tino Didriksen. 2015. Cg-3 – beyond classical constraint grammar. In Proceedings of the 20th Nordic Conference of Computational Linguistics, NODALIDA, pages 31–39. Linköping University Electronic Press, Linköpings universitet.

Kyumars Sheykh Esmaili and Shahin Salavati. 2013. Sorani Kurdish versus Kurmanji Kurdish: An Empirical Comparison. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, pages 300–305.

M. L. Forcada, M. Ginestí-Rosell, J. Nordfalk, J. O’Regan, S. Ortiz-Rojas, J. A. Pérez-Ortiz, F. Sánchez-Martínez, G. Ramírez-Sánchez, and F. M. Tyers. 2011. Apertium: a free/open-source platform for rule-based machine translation. Machine Translation, 25(2):127–144.

Eliyahu Kiperwasser and Yoav Goldberg. 2016. Simple and accurate dependency parsing using bidirectional LSTM feature representations. TACL, 4:313–327.

Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. In Proceedings of Workshop at ICLR.

J. Nivre, J. Hall, J. Nilsson, A. Chanev, G. Eryigit, S. Kübler, S. Marinov, and E. Marsi. 2007. Malt-Parser: A language-independent system for datadriven dependency parsing. Natural Language Engineering, 13(2):95–135.

Joakim Nivre, Marie-Catherine de Marneffe, Filip Ginter, Yoav Goldberg, Jan Hajic, Chris Manning, Ryan McDonald, Slav Petrov, Sampo Pyysalo, Natalia Silveira, Reut Tsarfaty, and Dan Zeman. 2016. Universal Dependencies v1: A Multilingual Treebank Collection. In Proceedings of Language Resources and Evaluation Conference (LREC’16).

Bisarê Segman. 1944. Dr. Rweylot. Ronahî, 24. Trad. Doyle, A. C. (1892) The Adventure of the Speckled Band.

Gary F. Simons and Charles D. Fennig, editors. 2017.  Ethnologue: Languages of the World. SIL International.

Milan Straka, Jan Hajic, and Jana Straková. 2016. UDPipe: trainable pipeline for processing CoNLL-U files performing tokenization, morphological analysis, pos tagging and parsing. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16), Paris, France, May. European Language Resources Association (ELRA).

Wheeler M. Thackston. 2006. Kurmanji Kurdish: A Reference Grammar with Selected Readings. http://www.fas.harvard.edu/~iranian/Kurmanji/index.html.

Géraldine Walther, Benoît Sagot, and Karën Fort. 2010. Fast Development of Basic NLP Tools: Towards a Lexicon and a POS Tagger for Kurmanji Kurdish. In International Conference on Lexis and Grammar, September.

Daniel Zeman, Martin Popel, Milan Straka, Jan Hajic, Joakim Nivre, Filip Ginter, Juhani Luotolahti, Sampo Pyysalo, Slav Petrov, Martin Potthast, Francis Tyers, Elena Badmaeva, Memduh Gökirmak, Anna Nedoluzhko, Silvie Cinkova, Jan Hajic jr., Jaroslava Hlavacova, Václava Kettnerová, Zdenka Uresova, Jenna Kanerva, Stina Ojala, Anna Missilä, Christopher D. Manning, Sebastian Schuster, Siva Reddy, Dima Taji, Nizar Habash, Herman Leung, Marie-Catherine de Marneffe, Manuela Sanguinetti, Maria Simi, Hiroshi Kanayama, Valeria dePaiva, Kira Droganova, Héctor Martínez Alonso, Çagri Çöltekin, Umut Sulubacak, Hans Uszkoreit, Vivien Macketanz, Aljoscha Burchardt, Kim Harris, Katrin Marheinecke, Georg Rehm, Tolga Kayadelen, Mohammed Attia, Ali Elkahky, Zhuoran Yu, Emily Pitler, Saran Lertpradit, Michael Mandl, Jesse Kirchner, Hector Fernandez Alcalde, Jana Strnadová, Esha Banerjee, Ruli Manurung, Antonio Stella, Atsuko Shimada, Sookyoung Kwak, Gustavo Mendonca, Tatiana Lando, Rattima Nitisaroj, and Josie Li. 2017. Conll 2017 shared task: Multilingual parsing from raw text to Universal Dependencies. In Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies, pages 1–19, Vancouver, Canada, August. Association for Computational Linguistics.

Citations in Crossref