Conference article

Revising the METU-Sabanci Turkish Treebank: An Exercise in Surface-Syntactic Annotation of Agglutinative Languages

Alicia Burga
Pompeu Fabra University, Barcelona, Spain

Alp Öktem
Pompeu Fabra University, Barcelona, Spain

Leo Wanner
ICREA and Pompeu Fabra University, Barcelona, Spain

Download article

Published in: Proceedings of the Fourth International Conference on Dependency Linguistics (Depling 2017), September 18-20, 2017, Università di Pisa, Italy

Linköping Electronic Conference Proceedings 139:6, p. 32-41

Show more +

Published: 2017-09-13

ISBN: 978-91-7685-467-9

ISSN: 1650-3686 (print), 1650-3740 (online)

Abstract

In this paper, we present a revision of the training set of the METU-Sabanci Turkish syntactic dependency treebank composed of 4997 sentences in accordance with the principles of the Meaning-Text Theory (MTT). MTT reflects the multilayered nature of language by a linguistic model in which each linguistic phenomenon is treated at its corresponding level(s). Our analysis of the METU-Sabanci syntactic relation tagset reveals that it encodes deepmorphological and surface-syntactic phenomena, which should be separated according to the MTT model. We propose an schema and show that this schema also allows for a sound projection of the obtained surface annotation onto a deepsyntactic annotation, as needed for the implementation of down-stream language understanding applications.

Keywords

No keywords available

References

Nart B. Atalay, Kemal Oflazer, Bilge Say, and Informatics Inst. 2003. The Annotation Process in the Turkish Treebank. In Proc. of the 4th Intern. Workshop on Linguistically Interpreteted Corpora (LINC).

B. Bohnet and L. Wanner. 2010. Open Source Graph Transducer Interpreter and Grammar Development Environment. In Proceedings of the International Conference on Linguistic Resources and Evaluation (LREC).

Alicia Burga, Simon Mille, Anton Granvik, and Leo Wanner. 2015. Towards a multi-layered dependency annotation of Finnish. In Proceedings of the Third International Conference on Dependency Linguistics (Depling 2015), pages 48–57, August.

Özlem Çetinoglu and Jonas Kuhn. 2013. Towards Joint Morphological Analysis and Dependency Parsing of Turkish. In Proceedings of the Second International Conference on Dependency Linguistics (DepLing 2013), pages 23–32, August.

Çagri Çöltekin. 2010. A Freely Available Morphological Analyzer for Turkish. In Proceedings o
15Each morphological analysis is composed by the base lexeme, its PoS, and the associated grammemes and derivatemes; as soon as a derivateme appears (as <0> in the first line), a new PoS is assigned (<N> in the mentioned example). the 7th International Conference on Language Resources and Evaluation (LREC 2010), pages 820–827.

Gülsen Eryigit, Joakim Nivre, and Kemal Oflazer. 2008. Dependency parsing of Turkish. Computational Linguistics, 34(3):357–389.

Güls¸en Eryigit, Tugay ?Ilbay, and Ozan Arkan Can. 2011. Multiword expressions in statistical dependency parsing. In Proceedings of the Second Workshop on Statistical Parsing of Morphologically Rich Languages, SPMRL ’11, pages 45–55, Stroudsburg, PA, USA. Association for Computational Linguistics.

Igor Mel’cuk and Leo Wanner. 2008. Morphological mismatches in machine translation. Machine translation, 22(3):101–152.

Igor Mel’cuk. 1988. Dependency Syntax: Theory and Practice. State University of New York Press, Albany.

Igor Mel’cuk. 2012a. Semantics, Volume 1. John Benjamins Publishing Company, Amsterdam.

Igor Mel’cuk. 2012b. Syntax. Bi-nominative sentences in Russian. In V. Makarova, editor, Russian Language Studies in North America: New Perspectives from Theoretical and Applied Linguistics, pages 86–105. Anthem Press, London.

Simon Mille, Alicia Burga, and Leo Wanner. 2013. AnCora-UPF: A multi-level annotation of Spanish. In Proceedings of DepLing, Prague, Czech Republic.

Kemal Oflazer, Elvan Gmen, and Cem Bozsahin. 1994. An Outline of Turkish Morphology. Kemal Oflazer, Bilge Say, Dilek Zeynep Hakkani-Tür, and G¨okhan T¨ur. 2003. Building a Turkish treebank. In Treebanks: Building and Using Parsed Corpora, pages 261–277. Springer.

Umut Sulubacak, Tugba Pamay, and Güls¸en Eryigit. 2016. IMST: A Revisited Turkish Dependency Treebank. In TurCLing 2016, The First International Conference on Turkic Computational Linguistics at CICLING 2016, pages 1–6.

Citations in Crossref