The problem of (semi-)automatic treebank conversion arises when converting between different schemas, such as from a language specific schema to Universal Dependencies, or when converting from one Universal Dependencies version to the next. We propose a formalism based on top-down tree transducers to convert dependency trees. Building on a well-defined mechanism yields a robust transformation system with clear semantics for rules and which guarantees that every transformation step results in a well formed tree, in contrast to previously proposed solutions. The rules only depend on the local context of the node to convert and rely on the dependency labels as well as the PoS tags. To exemplify the efficiency of our approach, we created a rule set based on only 45 manually transformed sentences from the Hamburg Dependency Treebank. These rules can already transform annotations with both coverage and precision of more than 90%.
Lars Ahrenberg. 2015. Converting an english-swedish parallel treebank to universal dependencies. In Proceedings of the Third International Conference on Dependency Linguistics (Depling 2015), pages 10–19, Uppsala, Sweden, August. Uppsala University, Uppsala, Sweden.
Kilian A. Foth, Arne Köhn, Niels Beuck, and Wolfgang Menzel. 2014. Because size does matter: The Hamburg Dependency Treebank. In Proceedings of the Language Resources and Evaluation Conference 2014. LREC, European Language Resources Association (ELRA).
Anders Johannsen, Héctor Martínez Alonso, and Barbara Plank. 2015. Universal dependencies for danish. In Markus Dickinson, Erhard Hinrichs, Agnieszka Patejuk, and Adam Przepiórkowski, editors, Proceedings of the Fourteenth International Workshop on Treebanks and Linguistic Theories (TLT14), pages 157–167, Warsaw, Poland.
Bevan Jones, Mark Johnson, and Sharon Goldwater. 2011. Formalizing semantic parsing with tree transducers. In Proceedings of the Australasian Language Technology Association Workshop 2011, pages 19–28, Canberra, Australia, December.
Andreas Maletti. 2010. Survey: Tree transducers in machine translation. In Henning Bordihn, Rudolf Freund, Thomas Hinze, Markus Holzer, Martin Kutrib, and Friedrich Otto, editors, Proc. 2nd Int. Workshop Non-Classical Models of Automata and Applications, volume 263 of books@ocg. at , pages 11–32. O¨ sterreichische Computer Gesellschaft.
Ryan McDonald, Joakim Nivre, Yvonne Quirmbach-Brundage, Yoav Goldberg, Dipanjan Das, Kuzman Ganchev, Keith Hall, Slav Petrov, Hao Zhang, Oscar Täckström, Claudia Bedini, Núria Bertomeu Castelló, and Jungmee Lee. 2013. Universal dependency annotation for multilingual parsing. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 92–97, Sofia, Bulgaria, August. Association for Computational Linguistics.
Mehryar Mohri, Fernando Pereira, and Michael Riley. 2002. Weighted finite-state transducers in speech recognition. Computer Speech & Language, 16(1):69 – 88.
Joakim Nivre. 2014. Universal dependencies for swedish. In Proceedings of the Swedish Language Technology Conference (SLTC), Uppsala, Sweden, November. Uppsala University, Uppsala, Sweden.
Lilja Øvrelid and Petter Hohle. 2016. Universal dependencies for norwegian. In Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asunción Moreno, Jan Odijk, and Stelios Piperidis, editors, Proceedings of the Tenth International Conference on Language Resources and Evaluation LREC 2016, Portorož, Slovenia, May 23-28, 2016. European Language Resources Association
(ELRA).
Sampo Pyysalo, Jenna Kanerva, Anna Missilå, Veronika Laippala, and Filip Ginter. 2015. Universal dependencies for finnish. In Proceedings of the 20th Nordic Conference of Computational Linguistics (NODALIDA 2015), pages 163–172, Vilnius, Lithuania, May. Linköping University Electronic Press, Sweden.
Corentin Ribeyre, Djam´e Seddah, and ´ Eric Villemonte de la Clergerie. 2012. A linguistically-motivated 2-stage tree to graph transformation. In Proceedings of the 11th International Workshop on Tree Adjoining Grammars and Related Formalisms (TAG+11), pages 214–222, Paris, France, September.
Juhi Tandon, Himani Chaudhry, Riyaz Ahmad Bhat, and Dipti Misra Sharma. 2016. Conversion from paninian karakas to universal dependencies for hindi dependency treebank. In Katrin Tomanek and Annemarie Friedrich, editors, Proceedings of the 10th Linguistic Annotation Workshop held in conjunction with ACL 2016, LAW@ACL 2016, August 11, 2016, Berlin, Germany. The Association for Computer Linguistics.
James W. Thatcher. 1970. Generalized sequential machine maps. Journal of Computer and System Sciences, 4(4):339–367.
Francis M. Tyers and Mariya Sheyanova. 2017. Annotation schemes in north s´ami dependency parsing. In Proceedings of the Third Workshop on Computational Linguistics for Uralic Languages, pages 66–75, St. Petersburg, Russia, January. Association for Computational Linguistics.