Building gold-standard treebanks for Norwegian

Per Erik Solberg
National Library of Norway, P.O.Box 2674 Solli, NO-0203 Oslo, Norway

Ladda ner artikel

Ingår i: Proceedings of the 19th Nordic Conference of Computational Linguistics (NODALIDA 2013); May 22-24; 2013; Oslo University; Norway. NEALT Proceedings Series 16

Linköping Electronic Conference Proceedings 85:44, s. 459-464

NEALT Proceedings Series 16:44, s. 459-464

Visa mer +

Publicerad: 2013-05-17

ISBN: 978-91-7519-589-6

ISSN: 1650-3686 (tryckt), 1650-3740 (online)


Språkbanken at the National Library of Norway is currently building up gold-standard Dependency Grammar treebanks for Norwegian Bokmål and Nynorsk. The treebanks are manually annotated for morphological features; syntactic functions and dependency relations. This paper explains the choice of texts and format of the treebanks; some key aspects of the morphological and syntactic annotation; and it is illustrated how the treebanks can be used.


Treebanking; Dependency Grammar; Morphology; Syntax; Norwegian


Böhmová; A.; Haji?c; J.; Haji?cová; E.; and Hladká; B. (2003). The Prague Dependency Treebank. In Treebanks; pages 103–127. Springer; Netherlands.

Brants; S.; Dipper; S.; Eisenberg; P.; Hansen-Schirra; S.; König; E.; Lezius; W.; Rohrer; C.; Smith; G.; and Uszkoreit; H. (2004). Tiger: Linguistic interpretation of a german corpus. Research on Language and Computation; 2(4):597–620.

Brants; T. (2000). Tnt: a statistical part-of-speech tagger. In Proceedings of the sixth conference on Applied natural language processing; pages 224–231; Seattle; WA.

Buchholz; S. and Marsi; E. (2006). Conll-x shared task on multilingual dependency parsing. In Proceedings of the Tenth Conference on Computational Natural Language Learning; pages 149–164; New York; NY.

Covington; M. A. (2001). A fundamental algorithm for dependency parsing. In Proceedings of the 39th annual ACM southeast conference; pages 95–102; Athens; GA.

Fan; R.-E.; Chang; K.-W.; Hsieh; C.-J.; Wang; X.-R.; and Lin; C.-J. (2008). Liblinear: A library for large linear classification. The Journal of Machine Learning Research; 9:1871–1874.

Johannessen; J. B.; Hagen; K.; Nøklestad; A.; and Lynum; A. (2011). Obt+ stat: Evaluation of a combined cg and statistical tagger. Constraint Grammar Applications; pages 26–34.

Julien; M. (2009). Embedded clauses with main clause word order in mainland scandinavian. Published on LingBuzz:(http://ling. auf. net/lingBuzz/000475).

Kinn; K.; Solberg; P. E.; and Eriksen; P. K. (2013). Retningslinjer for morfologisk og syntaktisk annotasjon i Språkbankens gullkorpus. Manuscript. Språkbanken; National Library of Norway. URL: http://www.nb.no/Tilbud/Forske/Spraakbanken/ Tilgjengelege-ressursar/Tekstressursar. [last visited on 05/04/2013].

Nivre; J.; Hall; J.; Kübler; S.; McDonald; R.; Nilsson; J.; Riedel; S.; and Yuret; D. (2007). The conll 2007 shared task on dependency parsing. In Proceedings of the CoNLL Shared Task Session of EMNLP-CoNLL; pages 915–932; Toulouse; France.

Nivre; J.; Hall; J.; and Nilsson; J. (2006a). Maltparser: A data-driven parser-generator for dependency parsing. In Proceedings of LREC; volume 6; pages 2216–2219.

Nivre; J.; Hall; J.; Nilsson; J.; Eryiit; G.; and Marinov; S. (2006b). Labeled pseudo-projective dependency parsing with support vector machines. In Proceedings of the Tenth Conference on Computational Natural Language Learning; pages 221–225; New York; NY.

Pajas; P. and Št?epánek; J. (2009). System for querying syntactically annotated corpora. In Proceedings of the ACL-IJCNLP 2009 Software Demonstrations; pages 33–36; Suntec; Singapore.

Citeringar i Crossref