This volume contains the articles presented at the Constraint Grammar workshop on methods, tools and applications, co-located with the NoDaLiDa 2019 conference in Turku, held on 30 September 2019. This workshop series has been part of the NoDaLiDa conference since 2005, and is the eight in the row, thereby emphasizing the Nordic roots of Constraint Grammar.
True to its tradition, the workshop may be characterised along two thematic lines: The development of constraint grammars analysing individual languages or specific aspects of their grammar, and presentations and discussions on practical language technology tools where CG is the key component. Moreover the workshop also contained two more general papers, dealing with theoretical issues relevant to any language.
As for the first theme, these proceedings contain papers presenting CG grammars for Tibetan (Faggionato and Garret), Lithuanian (Jagiella), and Greenlandic (Molich). The one on Tibetan looks at verb valency for texts from different historical phases of the Tibetan literary language, whereas the one on Lithuanian presents the first version of a general disambiguator for the language. Molich’s paper on Greenlandic looks at a particularly vexing problem of Greenlandic grammar: The disambiguation of conjunctional and adverbial functions of enclitical particles. In addition to being central in determining the overall structure of the Greenlandic sentence at hand, they also affect the performance of current work on Greenlandic to Danish machine translation. A paper with a similar scope is Schmirler and Arppe’s presentation of a set of rules for dealing with negation in Plains Cree.
Relevant to the second theme are two papers on grammar checking: Aldezabal, Arriola and Estarrona present a grammar-helping tool for Basque, and Wiechetek, Moshagen, Gaup and Omma use the workshop to launch a brand-new grammar checker for North Saami. Grammar checking tools have a long tradition within CG, but these two presentations introduce grammar checking to languages with a much richer morphology than usual.
The workshop also contains some more general papers. Bick’s “Tagging What Isn’t There” discusses methods for annotating information not explicitly present in the language under analysis (here: Danish). Being set in the context of an MT project from Danish to Greenlandic the paper could be seen under the two previous categories as well, but the approach is kept at a general level, using Danish as an example language.
The two last papers look at interaction between CG and different machine learning approaches. The paper by Yli-Jyrä shows that both CG and recursive neural networks have finite-state properties, and discusses the theoretical implications of this observation. Finally, the paper by Muischnek, Müürisep and Särg describe how CG is used to build gold standards for machine learning, by tagging the Estonian Universal Dependency corpus with CG.
As can be seen from the presentation, CG holds its position as the dominant framework for morphology-rich languages. Greenlandic, Plains Cree and North Saami all belong in the morphology-rich corner of the typological spectre, and the rest of the languages under scrutiny also possess more inflectional categories than the mainstream languages. Another characteristics of CG, its ability of providing analyses good enough to be used for practical applications, is also evident from the list of contributions.
On behalf of the workshop organizers
Eckhard Bick & Trond Trosterud