Conference article

Automatic identification of construction candidates for a Swedish constructicon

Linnéa Bäckström
Dept. of Swedish, University of Gothenburg, Sweden

Lars Borin
Dept. of Swedish, University of Gothenburg, Sweden

Markus Forsberg
Dept. of Swedish, University of Gothenburg, Sweden

Benjamin Lyngfelt
Dept. of Swedish, University of Gothenburg, Sweden

Julia Prentice
Dept. of Swedish, University of Gothenburg, Sweden

Emma Sköldberg
Dept. of Swedish, University of Gothenburg, Sweden

Download article

Published in: Proceedings of the workshop on lexical semantic resources for NLP at NODALIDA 2013; May 22-24; 2013; Oslo; Norway. NEALT Proceedings Series 19

Linköping Electronic Conference Proceedings 88:2, p. 2-11

NEALT Proceedings Series 19:2, p. 2-11

Show more +

Published: 2013-05-17

ISBN: 978-91-7519-586-5

ISSN: 1650-3686 (print), 1650-3740 (online)

Abstract

We present an experiment designed for extracting construction candidates for a Swedish constructicon from text corpora. We have explored the use of hybrid n-grams with the practical goal to discover previously undescribed partially schematic constructions. The experiment was successful; in that quite a few new constructions were discovered. The precision is low; but as a push-button tool for construction discovery; it has proven a valuable tool for the work on a Swedish constructicon.

Keywords

Hybrid n-gram; Swedish; constructions; constructicon

References

Biber; D. and Barbieri; F. (2007). Lexical bundles in university spoken and written registers. English for Specific Purposes; 26:263–286.

Biber; D. and Conrad; S. (1999). Lexical bundles in conversation and academic prose. In Hasselgard; H. and Oksefjell; S.; editors; Out of corpora: Studies in honor of Stig Johansson; pages 77–85. Rodopi; Amsterdam.

Borin; L.; Danélls; D.; Forsberg; M.; Kokkinakis; D.; and Toporowska Gronostaj; M. (2010). The past meets the present in Swedish FrameNet++. In 14th EURALEX International Congress; pages 269–281; Leeuwarden. EURALEX.

Borin; L. and Forsberg; M. (2009). All in the family: A comparison of SALDO and WordNet. In Proceedings of the Nodalida 2009 Workshop on WordNets and other Lexical Semantic Resources – between Lexical Semantics; Lexicography; Terminology and Formal Ontologies; Odense. NEALT.

Borin; L.; Forsberg; M.; and Lönngren; L. (2008). The hunting of the BLARK – SALDO; a freely available lexical database for Swedish language technology. In Nivre; J.; Dahllöf; M.; and Megyesi; B.; editors; Resourceful language technology. Festschrift in honor of Anna Sågvall Hein; number 7 in Acta Universitatis Upsaliensis: Studia Linguistica Upsaliensia; pages 21–32. Uppsala University; Department of Linguistics and Philology; Uppsala.

Borin; L.; Forsberg; M.; Olsson; L.-J.; and Uppström; J. (2012a). The open lexical infrastructure of Språkbanken. In Proceedings of LREC 2012; pages 3598–3602; Istanbul. ELRA.

Borin; L.; Forsberg; M.; and Roxendal; J. (2012b). Korp – the corpus infrastructure of Språkbanken. In Proceedings of LREC 2012; pages 474–478; Istanbul. ELRA.

Ejerhed; E. and Källgren; G. (1997). Stockholm Umeå corpus 1.0. Produced by Department of Linguistics; Umeå University and Department of Linguistics; Stockholm University. ISBN 91-7191-348-3.

Ejerhed; E.; Källgren; G.; Wennstedt; O.; and Åström; M. (1992). The linguistic annotation system of the Stockholm-Umeå corpus project - description and guidelines. Technical report; Department of Linguistics; Umeå University.

Fillmore; C.; Lee-Goldman; R.; and Rhomieux; R. (2012). The framenet constructicon. In Boas; H. and Sag; I.; editors; Sign-Based Construction Grammar; pages 309–372. CSLI; Stanford.

Fillmore; C. J. (2008). Border conflicts: FrameNet meets Construction Grammar. In Bernal; E. and DeCesaris; J.; editors; Proceedings of the XIII EURALEX International Congress; pages 49–68; Barcelona. Universitat Pompeu Fabra; Universitat Pompeu Fabra.

Köhler; P. O. and Messelius; U. (2001). Natur och Kulturs svenska ordbok. Bokförlaget Natur och Kultur; Stockholm.

Källström; R. (2012). Svenska i kontrast. Tvärspråkliga perspektiv på svensk grammatik. Studentlitteratur; Lund.

Lagus; K.; Kohonen; O.; and Virpioja; S. (2009). Towards unsupervised learning of constructions from text. In Sahlgren; M. and Knutsson; O.; editors; Proceedings of the Workshop on Extracting and Using Constructions in NLP of 17th Nordic Conference on Computational Linguistics; NODALIDA. SICS Technical Report T2009:10.

Lyngfelt; B.; Borin; L.; Forsberg; M.; Prentice; J.; Rydstedt; R.; Sköldberg; E.; and Tingsell; S. (2012). Adding a construction to the swedish resource network of Språkbanken. In Proceedings of KONVENS 2012 (LexSem 2012 workshop); pages 452–461; Vienna.

Megyesi; B. (2009). The open source tagger HunPoS for Swedish. In Jokinen; K. and Bick; E.; editors; Proceedings of the Nordic Conference on Computational Linguistics (Nodalida); volume 4 of NEALT Proceedings Series; pages 239–241; Odense; Denmark.

Nekrasova; T. M. (2009). English l1 and l2 speakers’ knowledge of lexical bundles. Language Learning; 59(3):647–686.

Nivre; J.; Hall; J.; Nilsson; J.; Chanev; A.; Eryi?git; G.; Kübler; S.; Marinov; S.; and Marsi; E. (2007). Maltparser: A language-independent system for data driven dependency parsing. Natural Language Engineering; 13(2):95–135.

Pecina; P. (2010). Lexical association measures and collocation extraction. Language Resources and Evaluation; 44:137–158.

Prentice; J. (2011). ”jag är född på andra november" konventionaliserade tidsuttryck som konstruktioner – ur ett andraspråksperspektiv. Technical report; Institutionen för svenska språket; Göteborgs universitet.

Prentice; J. and Sköldberg; E. (2011). Figurative word combinations in texts written by adolescents in multilingual school environments. In Källström; R. and Lindberg; I.; editors; Young urban Swedish. Variation and change in multilingual settings. University of Gothenburg.

Svenska Akademien (2009). Svensk ordbok. Norstedts; Stockholm.

Svenska språknämnden (2005). Språkriktighetsboken. Norstedts Akademiska Förlag; Stockholm.

Tsao; N.-L. and Wible; D. (2009). A method for unsupervised broad-coverage lexical error detection and correction. In Proceedings of the Fourth Workshop on Innovative Use of NLP for Building Educational Applications; pages 51–54; Boulder. ACL.

Wermter; J. and Hahn; U. (2006). You can’t beat frequency (unless you use linguistic knowledge) – A qualitative evaluation of association measures for collocation and term extraction. In Proceedings of COLING-ACL 2006; pages 785–792; Sydney. ACL.

Wible; D. and Tsao; N.-L. (2010). StringNet as a computational resource for discovering and investigating linguistic constructions. In Proceedings of the NAACL HLT Workshop on Extracting and Using Constructions in Computational Linguistics; pages 25–31; Los Angeles. ACL.

Wible; D. and Tsao; N.-L. (2011). The StringNet lexico-grammatical knowledgebase and its applications. In Proceedings of the Workshop on Multiword Expressions: from Parsing and Generation to the Real World; pages 128–130; Portland. ACL.

Citations in Crossref