Konferensartikel

Using Constraint Grammar for Chunking

Eckhard Bick
University of Southern Denmark, Odense, Denmark

Ladda ner artikel

Ingår i: Proceedings of the 19th Nordic Conference of Computational Linguistics (NODALIDA 2013); May 22-24; 2013; Oslo University; Norway. NEALT Proceedings Series 16

Linköping Electronic Conference Proceedings 85:7, s. 13-26

NEALT Proceedings Series 16:7, p. 13-26

Visa mer +

Publicerad: 2013-05-17

ISBN: 978-91-7519-589-6

ISSN: 1650-3686 (tryckt), 1650-3740 (online)

Abstract

This paper presents and evaluates a novel and flexible chunking method using Constraint Grammar (CG) rules to introduce chunk edges in corpus annotation. Our method exploits preexisting (non-constituent) morphosyntactic annotation such as part-of-speech or function tags; but can also be made to work on raw text; integrated with other CG modules. The first version of the chunker was developed for German CG-annotated interview data; with a parallel English version derived from the German one; indicating a high degree of language-independence of the rules in the presence of generalized syntactic-functional tags (e.g. subject; object; modifier). Two different approaches are discussed; one for minimal; flat chunking; the other for deep; nested chunking. The system has a reasonable performance and robustness for both; achieving F-scores of 89.1 and 97.4 for nested and minimal chunking; respectively. Xml markup is supported; and with a full set of rules; the tool can be used to convert CG annotation into complete constituent trees in VISL or TIGER format.

Nyckelord

Chunking; Constraint Grammar; Syntactic Constituent Trees

Referenser

Inga referenser tillgängliga

Citeringar i Crossref