Conference article

Optimizing the Oslo-Bergen Tagger

Eckhard Bick
University of Southern Denmark, Odense, Denmark

Kristin Hagen
University of Oslo, Norway, Norway

Anders Nøklestad
University of Oslo, Norway, Norway

Download article

Published in: Proceedings of the Workshop on “Constraint Grammar - methods, tools and applications” at NODALIDA 2015, May 11-13, 2015, Institute of the Lithuanian Language, Vilnius, Lithuania

Linköping Electronic Conference Proceedings 113:2, p. 11-17

NEALT Proceedings Series 24:2, p. 11-17

Show more +

Published: 2015-06-17

ISBN: 978-91-7519-037-2

ISSN: 1650-3686 (print), 1650-3740 (online)

Abstract

In this paper we discuss and evaluate machine learning-based optimization of a Constraint Grammar for Norwegian Bokmål (OBT). The original linguistwritten rules are reiteratively re-ordered, re-sectioned and systematically modified based on their performance on a handannotated training corpus. We discuss the interplay of various parameters and propose a new method, continuous sectionizing. For the best evaluated parameter constellation, part-of-speech F-score improvement was 0.31 percentage points for the first pass in a 5-fold cross evaluation, and over 1 percentage point in highly iterated runs with continuous resectioning.

Keywords

No keywords available

References

Bick, Eckhard. 2013. ML-Tuned Constraint Grammars. In: Proceedings of the 27th Pacific Asia Conference on Language, Information and Computation, pp. 440-449. Taipei: Department of English, National Chengchi University.

Bick, Eckhard. 2014. ML-Optimization of Ported Constraint Grammars. In: Calzolari, Nicoletta et al. (eds.), Proceedings of the 9th International Conference on Language Resources and Evaluation, LREC2014 (Reykjavik, May 28-30, 2014). pp. 3382-3386.

Bick, Eckhard & Didriksen, Tino. 2015. CG-3 - Beyond Classical Constraint Grammar. In: Proceedings of NoDaLiDa 2015 (forthcoming). Faarlund, Jan Terje, Lie, vein & Vannebo, Kjell Ivar. 1995. Norsk referansegrammatikk. Oslo: Universitetsforlaget.

Hagen, Kristin & Nøklestad, Anders. 2010. Bruk av et norsk leksikon til tagging og andre språkteknologiske formål. LexicoNordica 2010 (17) pp. 55-72.

Hagen, Kristin & Johannessen, Janne Bondi. 2003. Parsing Nordic Languages (PaNoLa) - norsk versjon. Nordisk Sprogteknologi 2002. Museum Tusculanums Forlag, Københavns universitet.

Johannessen, Janne Bondi and Helge Hauglin. 1998. An Automatic Analysis of Norwegian Compounds. In Haukioja, T. (ed.): Papers from the 16th Scandinavian Conference of Linguistics, Turku/Åbo, Finland 1996 : 209-220.

Johannessen, Janne Bondi, Hagen, Kristin& Nøklestad, Anders. 2000. A Constraint-based Tagger for Norwegian. In 17th Scandinavian Conference of Linguistics [Odense Working Papers in Language and Communication 19].

Johannessen, Janne Bond; Hagen, Kristin; Lynum, André; Nøklestad, Anders. 2012. OBT+stat: A combined rule-based and statistical tagger. In Andersen, Gisle (ed.). Exploring Newspaper Language: Using the web to create and investigate a large corpus of modern Norwegian, s. 51–66.

Karlsson, Fred, Voutilainen, Atro, Heikkilä, Juha& Anttila, Arto. 1995. Constraint Grammar: A Language-Independent System for Parsing Unrestricted Text. In Natural Language Processing, No 4. Berlin and New York: Mouton de Gruyter.

Lager, Torbjörn. 1999. The µ-TBL System: Logic Programming Tools for Transformation-Based Learning. In: Proceedings of CoNLL’99, Bergen.

Lindberg, Nikolaj & Eineborg, Martin. 1998. Learning Constraint Grammar-style Disambiguation Rules using Inductive Logic Programming. COLING-ACL 1998: 775-779

Norsk Ordbank ‘Norwegian Word Bank’. 2010. http://www.hf.uio.no/iln/om/organisasjon/edd/forsking/norsk-ordbank/.

Oslo-Bergen Tagger homepage.<http://tekstlab.uio.no/obt-ny/>.
Proceedings

Citations in Crossref