Using Constraint Grammar for Chunking

Eckhard Bick
University of Southern Denmark, Odense, Denmark

Ingår i: Proceedings of the 19th Nordic Conference of Computational Linguistics (NODALIDA 2013); May 22-24; 2013; Oslo University; Norway. NEALT Proceedings Series 16

Publicerad: 2013-05-17

This paper presents and evaluates a novel and flexible chunking method using Constraint Grammar (CG) rules to introduce chunk edges in corpus annotation. Our method exploits preexisting (non-constituent) morphosyntactic annotation such as part-of-speech or function tags; but can also be made to work on raw text; integrated with other CG modules. The first version of the chunker was developed for German CG-annotated interview data; with a parallel English version derived from the German one; indicating a high degree of language-independence of the rules in the presence of generalized syntactic-functional tags (e.g. subject; object; modifier). Two different approaches are discussed; one for minimal; flat chunking; the other for deep; nested chunking. The system has a reasonable performance and robustness for both; achieving F-scores of 89.1 and 97.4 for nested and minimal chunking; respectively. Xml markup is supported; and with a full set of rules; the tool can be used to convert CG annotation into complete constituent trees in VISL or TIGER format.


Chunking; Constraint Grammar; Syntactic Constituent Trees


