Tagging What Isn’t There: Enriching CG Annotation With Implicit Information

Eckhard Bick
Institute of Language and Communication, University of Southern Denmark, Denmark

Ingår i: Proceedings of the NoDaLiDa 2019 Workshop on Constraint Grammar - Methods, Tools and Applications, 30 September 2019, Turku, Finland

Linköping Electronic Conference Proceedings 168:2, s. 5-11

NEALT Proceedings Series 33:2, s. 5-11

Publicerad: 2019-12-03

ISBN: 978-91-7929-918-7

ISSN: 1650-3686 (tryckt), 1650-3740 (online)


This paper examines ways to make existing Constraint Grammar (CG) annotation grammatically more explicit, allowing corpus users and application programs, such as machine translation (MT), to refer to context-implied grammatical features in a more direct fashion. Two types of categories are addressed. First, morphological categories are propagated to words that leave them under-specified (e.g. number and definiteness for Danish adjectives) or unexpressed (e.g. person-number for Danish verbs). Second, we also introduce new categories, such as aspect and future tense for Danish, that may be morphologically explicit in a given MT target language, but do not exist in the source language. In a pilot evaluation of four categories in the context of Danish-Greenlandic MT, the implemented enrichment grammar for Danish achieved F-scores of 97% for propagated categories and 85% for new categories. In addition to feature tagging, structural annotation is also made more explicit, adding secondary dependency links for e.g. the subjects of relative and infinitive clauses, or attribute links between subject complements and subjects.


Constraint Grammar, Morphology, Feature Propagation, Machine Translation, Tense-Aspect-Mode (TAM) tagging


