Bootstrapping UD treebanks for Delexicalized Parsing

Prasanth Kolachina
University of Gothenburg, Sweden

Aarne Ranta
University of Gothenburg, Sweden

Ladda ner artikel

Ingår i: Proceedings of the 22nd Nordic Conference on Computational Linguistics (NoDaLiDa), September 30 - October 2, Turku, Finland

Linköping Electronic Conference Proceedings 167:2, s. 15--24

NEALT Proceedings Series 42:2, p. 15--24

Visa mer +

Publicerad: 2019-10-02

ISBN: 978-91-7929-995-8

ISSN: 1650-3686 (tryckt), 1650-3740 (online)


Standard approaches to treebanking traditionally employ a waterfall model (Sommerville, 2010), where annotation guidelines guide the annotation process and insights from the annotation process in turn lead to subsequent changes in the annotation guidelines. This process remains a very expensive step in creating linguistic resources for a target language, necessitates both linguistic expertise and manual effort to develop the annotations and is subject to inconsistencies in the annotation due to human errors. In this paper, we propose an alternative approach to treebanking---one that requires writing grammars. This approach is motivated specifically in the context of Universal Dependencies, an effort to develop uniform and cross-lingually consistent treebanks across multiple languages. We show here that a bootstrapping approach to treebanking via interlingual grammars is plausible and useful in a process where grammar engineering and treebanking are jointly pursued when creating resources for the target language. We demonstrate the usefulness of synthetic treebanks in the task of delexicalized parsing. Our experiments reveal that simple models for treebank generation are cheaper than human annotated treebanks, especially in the lower ends of the learning curves for delexicalized parsing, which is relevant in particular in the context of low-resource languages.


Dependency parsing Universal Dependencies Abstract syntax trees


Inga referenser tillgängliga

Citeringar i Crossref