Conference article

Towards a Dependency-based PropBank of General Finnish

Katri Haverinen
Turku Centre for Computer Science (TUCS), Turku, Finland

Veronika Laippala
Department of Languages and Translation Studies, University of Turku, Finland

Samuel Kohonen
Department of Information Technology, University of Turku, Finland

Anna Missilä
Department of Information Technology, University of Turku, Finland

Jenna Nyblom
Department of Information Technology, University of Turku, Finland

Stina Ojala
Department of Information Technology, University of Turku, Finland

Timo Viljanen
Department of Information Technology, University of Turku, Finland

Tapio Salakoski
Turku Centre for Computer Science (TUCS), Turku, Finland

Filip Ginter
Department of Information Technology, University of Turku, Finland

Download article

Published in: Proceedings of the 19th Nordic Conference of Computational Linguistics (NODALIDA 2013); May 22-24; 2013; Oslo University; Norway. NEALT Proceedings Series 16

Linköping Electronic Conference Proceedings 85:9, p. 41-57

NEALT Proceedings Series 16:9, p. 41-57

Show more +

Published: 2013-05-17

ISBN: 978-91-7519-589-6

ISSN: 1650-3686 (print), 1650-3740 (online)

Abstract

In this work; we present the first results of a project aiming at a Finnish Proposition Bank; an annotated corpus of semantic roles. The annotation is based on an existing treebank of Finnish; the Turku Dependency Treebank; annotated using the well-known Stanford Dependency scheme. We describe the use of the dependency treebank for PropBanking purposes and show that both annotation layers present in the treebank are highly useful for the annotation of semantic roles. We also discuss the specific features of Finnish influencing the development of a PropBank as well as the methods employed in the annotation; and finally; we present preliminary evaluation of the annotation quality.

Keywords

PropBank; Finnish; dependency

References

(2009). Collins English Dictionary — 30th Anniversary Edition. HarperCollins Publishers.

Baker; C. F.; Fillmore; C. J.; and Lowe; J. B. (1998). The Berkeley FrameNet project. In Proceedings of COLING-ACL’98; pages 86–90.

Choi; J. and Palmer; M. (2010). Retrieving correct semantic boundaries in dependency structure. In Proceedings of LAW IV; pages 91–99.

Dang; H. T.; Kipper; K.; Palmer; M.; and Rosenzweig; J. (1998). Investigating regular sense extensions based on intersective Levin classes. In Proceedings of COLING-ACL’98; pages 293– 299.

Duran; M. S. and Aluísio; S. M. (2011). Propbank-br: a Brazilian treebank annotated with semantic role labels. In Proceedings of STIL’11; pages 1862–1867.

Hakulinen; A.; Vilkuna; M.; Korhonen; R.; Koivisto; V.; Heinonen; T.-R.; and Alho; I. (2004). Iso suomen kielioppi / Grammar of Finnish. Suomalaisen kirjallisuuden seura.

Haverinen; K. (2012). Syntax annotation guidelines for the Turku Dependency Treebank. Technical Report 1034; Turku Centre for Computer Science.

Haverinen; K.; Ginter; F.; Laippala; V.; Kohonen; S.; Viljanen; T.; Nyblom; J.; and Salakoski; T. (2011). A dependency-based analysis of treebank annotation errors. In Proceedings of Depling’11; pages 115–124.

Haverinen; K.; Ginter; F.; Laippala; V.; Viljanen; T.; and Salakoski; T. (2010a). Dependency-basedpropbanking of clinical Finnish. In Proceedings of LAW IV; pages 137–141.

Haverinen; K.; Viljanen; T.; Laippala; V.; Kohonen; S.; Ginter; F.; and Salakoski; T. (2010b).Treebanking Finnish. In Dickinson; M.; Müürisep; K.; and Passarotti; M.; editors; Proceedings ofThe ninth International Workshop on Treebanks and Linguistic Theories (TLT9); pages 79–90. Itkonen; E. (1996). Maailman kielten erilaisuus ja samuus / Differences and Similarities of the World Languages. Gaudeamus

Levin; B. and Hovav; M. R. (1994). Unaccusativity: At the syntax–lexical semantics interface; volume 26 of Linguistic Inquiry. MIT Press.

Lindén; K.; Silfverberg; M.; and Pirinen; T. (2009). HFST tools for morphology — an efficient open-source package for construction of morphological analyzers. In State of the Art in Computational Morphology; volume 41 of Communications in Computer and Information Science; pages 28–47.

Marcus; M.; Marcinkiwicz; M. A.; and Santorini; B. (1993). Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics; 19(2):313–330.

De Marneffe; M.-C. and Manning; C. (2008a). Stanford typed dependencies manual. Technical report; Stanford University.

De Marneffe; M.-C. and Manning; C. (2008b). Stanford typed dependencies representation. In Proceedings of COLING’08; Workshop on Cross-Framework and Cross-Domain Parser Evaluation; pages 1–8. Proceedings of the 19th Nordic Conference of Computational Linguistics (NODALIDA 2013); Linköping Electronic Conference Proceedings #85 [page 56 of 474]

Meyers; A.; Reeves; R.; Macleod; C.; Szekely; R.; Zielinska; V.; Young; B.; and Grishman; R. (2004). The NomBank project: An interim report. In In Proceedings of the NAACL/HLT Workshop on Frontiers in Corpus Annotation.

Palmer; M.; Bhatt; R.; Narasimhan; B.; Rambow; O.; Sharma; D. M.; and Xia; F. (2009). Hindi syntax: annotating dependency; lexical predicate–argument structure; and phrase structure. In Proceedings of ICON’09.

Palmer; M.; Gildea; D.; and Kingsbury; P. (2005). The Proposition Bank: An annotated corpus of semantic roles. Computational Linguistics; 31(1):71–106.

Palmer; M.; Gildea; D.; and Xue; N. (2010). Semantic Role Labeling. Synthesis Lectures on Human Language Technologies. Morgan & Claypool Publishers.

Paulsen; G. (2011). Causation and Dominance. PhD thesis.

Perlmutter; D. (1978). Impersonal passives and the unaccusative hypothesis. In Proceedings of the Fourth Annual Meeting of the Berkeley Linguistic Society; pages 157–189.

Pirinen; T. (2008). Suomen kielen äärellistilainen automaattinen morfologinen jäsennin avoimen lähdekoodin resurssein. Master’s thesis; University of Helsinki.

Shibatani; M.; editor (1976). The Grammar of Causative Constructions; volume 6 of Syntax and Semantics. Seminar Press.

Wanner; L.; Mille; S.; and Bohnet; B. (2012). Towards a surface realization-oriented corpus annotation. In Proceedings of INLG ’12; pages 22–30.

Xue; N. and Palmer; M. (2009). Adding semantic roles to the Chinese treebank. Natural Language Engineering; 15(Special issue 01):143–172.

Xaghouani; W.; Diab; M.; Mansouri; A.; Pradhan; S.; and Palmer; M. (2010). The revised Arabic PropBank. In Proceedings of LAW IV; pages 222–226.

Citations in Crossref