Rune Lain Knudsen
Institute of Linguistic and Nordic Studies, University of Oslo
Ruth Vatvedt Fjeld
Institute of Linguistic and Nordic Studies, University of Oslo
Download articlePublished in: Proceedings of the workshop on lexical semantic resources for NLP at NODALIDA 2013; May 22-24; 2013; Oslo; Norway. NEALT Proceedings Series 19
Linköping Electronic Conference Proceedings 88:3, p. 12-20
NEALT Proceedings Series 19:3, p. 12-20
Published: 2013-05-17
ISBN: 978-91-7519-586-5
ISSN: 1650-3686 (print), 1650-3740 (online)
At the Department of Linguistics and Scandinavian Studies (ILN) and the University of Oslo; the task of assembling a balanced corpus representing modern Norwegian Bokmål has reached a significant milestone. The Corpus for Bokmål Lexicography (LBK) now consists of more than 100;000;000 words. These documents have been selected based on a statistical analysis of reading habits in the general population of Norway. The documents have been subject to both manual bibliographic annotation; as well as automatic morphological annotation for each document. LBK will play a central part of a set of interconnected lexical resources; the aim of which is to provide an extensive documentation of Norwegian Bokmål that covers lexical and other linguistic/lexico-syntactic aspects. This paper presents LBK2013; a subset of LBK that we consider to be an accurate and comprehensive representation of modern written Norwegian Bokmål. A description of the corpus; as well as a number of related projects are described.
NoDaLiDa 2013; Speech and Language Technologies; Northern Europe; Corpora; Lexicography; Lexical Semantics
Agirre; E. and Edmonds; P.; editors (2007). Word Sense Disambiguation - Algorithms and Applications; chapter 5; pages 107–131. Springer.
Evert; S. and Hardie; A. (2011). Twenty-first century corpus workbench: Updating a query architecture for the new millenium. In Proceedings of the Corpus Linguistics 2011 Conference. University of Birmingham.
Fellbaum; C.; editor (1998). WordNet - An Electronic Lexical Database. MIT Press.
Fjeld; R. V. and Nygaard; L. (2009). NorNet - a monolingual wordnet of modern norwegian. In NODALIDA 2009 workshop: WordNets and other Lexical Semantic Resources - between Lexical Semantics; Lexicography; Terminology and Formal Ontologies; volume 7 of NEALT Proceedings Series; pages 13–16.
Fjeld; R. V.; Nygaard; L.; and Bick; E. (2010). Semi-automatic retrieval of phraseological units in a corpus of modern norwegian. In Korpora; Web und Datenbanken. Computergestützte Methoden in der modernen Phraseologie und Lexicographie; volume 25.
Johannessen; J. B.; Hagen; K.; Lynum; A.; and Nøklestad; A. (2012). OBT+Stat: A combined rule-based and statistical tagger. In Exploring Newpaper Language; volume 49 of Studies in Corpus Linguistics; pages 51–65. John Benjamins.
Kilarriff; A. and Rosenzweig; J. (2000). English SENSEVAL: Report and results. In Proceedings of the 2nd International Conference on Language Resources and Evaluation.
Kilgarriff; A. and Rosenzweig; J. (2000). Framework and results for english SENSEVAL. In Computers and the Humanities; volume 34; pages 15–48. fd.
Kilgarriff; A. and Tugwell; D. (2002). Sketching words. In Lexicography and Natural Language Processing. Euralex.
Nygaard; L.; Priestley; J.; Nøklestad; A.; and Johannessen; J. B. (2008). Glossa: a multilingual; multimodal; configurable user interface. In Chair); N. C. C.; Choukri; K.; Maegaard; B.; Mariani; J.; Odijk; J.; Piperidis; S.; and Tapias; D.; editors; Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC’08); Marrakech; Morocco. European Language Resources Association (ELRA). http://www.lrec conf.org/proceedings/lrec2008.
Palmer; M.; Fellbaum; C.; and Dang; H. T. (2006). Making fine-grained and coarse-grained sense distinctions; both manually and automatically. In Natural Language Engineering; volume 12.
Pedersen; B.; Nimb; S.; Asmussen; J.; Sørensen; N.; Trap-Jensen; L.; and Lorentzen; H. (2009). Dannet: the challenge of compiling a wordnet for danish by reusing a monolingual dictionary. Language Resources and Evaluation; 43(3):269–299.