Publicerad: 2017-05-29
ISBN: 978-91-7685-501-0
ISSN: 1650-3686 (tryckt), 1650-3740 (online)
Abstract syntax is a tectogrammatical tree representation, which can be shared between languages. It is used for programming languages in compilers, and has been adapted to natural languages in GF (Grammatical Framework). Recent work has shown how GF trees can be converted to UD trees, making it possible to generate parallel synthetic treebanks for those 30 languages that are currently covered by GF. This paper attempts to invert the mapping: take UD trees from standard treebanks and reconstruct GF trees from them. Such a conversion is potentially useful in bootstrapping treebanks by translation. It can also help GF-based interlingual translation by providing a robust, efficient front end. However, since UD trees are based on natural (as opposed to generated) data and built manually or by machine learning (as opposed to rules), the conversion is not trivial. This paper will present a basic algorithm, which is essentially based on inverting the GF to UD conversion. This method enables covering around 70% of nodes, and the rest can be covered by approximative back up strategies. Analysing the reasons of the incompleteness reveals structures missing in GF grammars, but also some problems in UD treebanks.
Krasimir Angelov, Bj¨orn Bringert, and Aarne Ranta. 2014. Speech-enabled hybrid multilingual translation for mobile devices. In Proceedings of the Demonstrations at the 14th Conference of the European Chapter of the Association for Computational Linguistics, pages 41–44, Gothenburg, Sweden, April. Association for Computational Linguistics.
Krasimir Angelov. 2011. The Mechanics of the Grammatical Framework. Ph.D. thesis, Chalmers University of Technology.
William Croft, Dawn Nordquist, Katherine Looney, and Michael Regan. 2017. Linguistic Typology meets Universal Dependencies. In Treebanks and Linguistic Theories (TLT-2017), pages 63–75, Bloomington IN, January 20–21.
Haskell B. Curry. 1961. Some Logical Aspects of Grammatical Structure. In Structure of Language and its Mathematical Aspects: Proceedings of the Twelfth Symposium in Applied Mathematics, pages 56–68. American Mathematical Society.
Prasanth Kolachina and Aarne Ranta. 2016. From Abstract Syntax to Universal Dependencies. Linguistic Issues in Language Technology, 13(2).
Peter Ljungl¨of. 2004. The Expressivity and Complexity of Grammatical Framework. Ph.D. thesis, Department of Computing Science, Chalmers University of Technology and University of Gothenburg.
Joakim Nivre, Marie-Catherine de Marneffe, Filip Ginter, Yoav Goldberg, Jan Hajic, Christopher D. Manning, Ryan McDonald, Slav Petrov, Sampo Pyysalo, Natalia Silveira, Reut Tsarfaty, and Daniel Zeman. 2016. Universal Dependencies v1: A Multilingual Treebank Collection. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), Paris, France, May. European Language Resources Association (ELRA).
Joakim Nivre. 2006. Inductive Dependency Parsing. Springer.
Aarne Ranta. 2004. Computational Semantics in Type Theory. Mathematics and Social Sciences, 165:31–57.
Aarne Ranta. 2009. The GF Resource Grammar Library. Linguistic Issues in Language Technology, 2(2).
Aarne Ranta. 2011. Grammatical Framework: Programming with Multilingual Grammars. CSLI Publications, Stanford.
Siva Reddy, Oscar T¨ackstr¨om, Michael Collins, Tom Kwiatkowski, Dipanjan Das, Mark Steedman, and Mirella Lapata. 2016. Transforming Dependency Structures to Logical Forms for Semantic Parsing. Transactions of the Association for Computational Linguistics, 4.
Hiroyuki Seki, Takashi Matsumura, Mamoru Fujii, and Tadao Kasami. 1991. On multiple contextfree grammars. Theoretical Computer Science, 88(2):191–229.
Jörg Tiedemann and Zeljko Agic. 2016. Synthetic treebanking for cross-lingual dependency parsing. The Journal of Artificial Intelligence Research (JAIR), 55:209–248.