Conference article

Enriching the Swedish Sign Language Corpus with Part of Speech Tags Using Joint Bayesian Word Alignment and Annotation Transfer

Robert Östling
Department of Linguistics, Stockholm University, Stockholm, Sweden

Carl Börstell
Department of Linguistics, Stockholm University, Stockholm, Sweden

Lars Wallin
Department of Linguistics, Stockholm University, Stockholm, Sweden

Download article

Published in: Proceedings of the 20th Nordic Conference of Computational Linguistics, NODALIDA 2015, May 11-13, 2015, Vilnius, Lithuania

Linköping Electronic Conference Proceedings 109:34, p. 263-268

NEALT Proceedings Series 23:34, p. 263-268

Show more +

Published: 2015-05-06

ISBN: 978-91-7519-098-3

ISSN: 1650-3686 (print), 1650-3740 (online)

Abstract

We have used a novel Bayesian model of joint word alignment and part of speech (PoS) annotation transfer to enrich the Swedish Sign Language Corpus with PoS tags. The annotations were then hand-corrected in order to both improve annotation quality for the corpus, and allow the empirical evaluation presented herein.

Keywords

No keywords available

References

Inger Ahlgren and Brita Bergman. 2006. Det svenska teckenspråket. In Teckenspråk och teckenspråkiga: kunskaps- och forskningsöversikt, volume 2006:29 of Statens offentliga utredningar (SoU), pages 11–70. Ministry of Health and Social Affairs, March.

Brita Bergman. 1983. Verbs and adjectives: morphological processes in Swedish Sign Language. In Jim Kyle and BencieWoll, editors, Language in sign: An international perspective on sign language, pages 3–9, London. Croom Helm.

Lars Borin and Markus Forsberg. 2009. All in the family: A comparison of SALDO and WordNet. In NODALIDA 2009 Workshop on WordNets and other Lexical Semantic Resources – between Lexical Semantics, Lexicography, Terminology and Formal Ontologies, pages 7–12, Odense, Denmark.

Carl Börstell, Johanna Mesch, and Lars Wallin. 2014. Segmenting the Swedish Sign Language corpus: On the possibilities of using visual cues as a basis for syntactic segmentation. In Onno Crasborn, Eleni Efthimiou, Evita Fotinea, Thomas Hanke, Julie Hochgesang, Jette Kristoffersen, and Johanna Mesch, editors, Beyond the Manual Channel. Proceedings of the 6th Workshop on the Representation and Processing of Sign Languages, pages 7–10, Reykjavik, Iceland. ELRA.

Peter F. Brown, Vincent J. Della Pietra, Stephen A. Della Pietra, and Robert L. Mercer. 1993. The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics, 19(2):263–311, June.

John DeNero, Alexandre Bouchard-Côté, and Dan Klein. 2008. Sampling alignment structure under a Bayesian translation model. In Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing, pages 314–323, Honolulu, Hawaii, October. Association for Computational Linguistics.

E. Ejerhed, G. Ka¨llgren, O.Wennstedt, and M. Åström. 1992. The linguistic annotation system of the Stockholm-Ume°a corpus project. Technical report, Department of Linguistics, University of Umeå.

Yarin Gal and Phil Blunsom. 2013. A systematic Bayesian treatment of the IBM alignment models. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Stroudsburg, PA, USA. Association for Computational Linguistics.

Barbara Hunger. 2006. Noun/verb pairs in Austrian Sign Language (O¨ GS). Sign Language & Linguistics, 9(1/2):71–94, January.

Trevor Johnston. 2001. Nouns and verbs in Australian sign language: An open and shut case? Journal of Deaf Studies and Deaf Education, 6(4):235–57, January.

Trevor Johnston. 2010. From archive to corpus: Transcription and annotation in the creation of signed language corpora. International Journal of Corpus Linguistics, 15(1):106–131.

Trevor Johnston. 2014. Auslan corpus annotation guidelines. Centre for Language Sciences, Department of Linguistics, Macquarie University.

Vadim Kimmelman. 2009. Parts of speech in Russian Sign Language: The role of iconicity and economy. Sign Language & Linguistics, 12(2):161–186.

Gunnel Källgren, 2006. Manual of the Stockholm Umeå Corpus version 2.0. Department of Linguistics, Stockholm University, December. Sofia Gustafson-Capková and Britt Hartmann (eds.).

Coskun Mermer and Murat Sarac¸lar. 2011. Bayesian word alignment for statistical machine translation. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2, HLT ’11, pages 182–187, Stroudsburg, PA, USA. Association for Computational Linguistics.

Johanna Mesch, LarsWallin, and Thomas Björkstrand. 2012. Sign language resources in Sweden: Dictionary and corpus. In Proceedings 5th Workshop on the Representation and Processing of Sign Languages: Interactions between Corpus and Lexicon, Language Resources and Evaluation Conference (LREC), pages 127–130, Istanbul, Turkey.

Johanna Mesch, Maya Rohdell, and LarsWallin. 2014. Annoterade filer för svensk teckenspråkskorpus. Version 2. http://www.ling.su.se.

Robert Östling. 2013. Stagger: An open-source part of speech tagger for Swedish. North European Journal of Language Technology, 3:1–18.

Darcey Riley and Daniel Gildea. 2012. Improving the IBM alignment models using variational Bayes. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers - Volume 2, ACL ’12, pages 306–310, Stroudsburg, PA, USA. Association for Computational Linguistics.

Waldemar Schwager and Ulrike Zeshan. 2008. Word classes in sign languages: Criteria and classifications. Studies in Language, 32(3):509–545, September.

Johan Sjons. 2013. Automatic induction of word classes in Swedish Sign Language. Master’s thesis, Stockholm University.

Ted Supalla and Elissa L. Newport. 1978. How many seats in a chair?: The derivation of nouns and verbs in American Sign Language. In Patricia Siple, editor, Understanding language through sign language research, chapter 4, pages 91–132. Academic Press, New York, NY.

Oksana Tkachman and Wendy Sandler. 2013. The noun-verb distinction in two young sign languages. Gesture, 13(3):253–286.

Kristina Toutanova, H. Tolga Ilhan, and Christopher Manning. 2002. Extensions to HMM-based statistical word alignment models. In 2002 Conference on Empirical Methods in Natural Language Processing (EMNLP 2002), pages 87–94.

Stephan Vogel, Hermann Ney, and Christoph Tillmann. 1996. HMM-based word alignment in statistical translation. In Proceedings of the 16th Conference on Computational Linguistics - Volume 2, COLING ’96, pages 836–841, Stroudsburg, PA, USA. Association for Computational Linguistics.

Lars Wallin, Johanna Mesch, and Anna-Lena Nilsson. 2014. Transkriptionskonventioner för teckenspråkstexter (version 5). Technical report, Sign Language, Department of Linguistics, Stockholm University.

Peter Wittenburg, Hennie Brugman, Albert Russel, Alex Klassmann, and Han Sloetjes. 2006. ELAN: a professional framework for multimodality research. In Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06). http://tla.mpi.nl/tools/tla-tools/elan/.

David Yarowsky, Grace Ngai, and Richard Wicentowski. 2001. Inducing multilingual text analysis tools via robust projection across aligned corpora. In Proceedings of the First International Conference on Human Language Technology Research, HLT ’01, pages 1–8, Stroudsburg, PA, USA. Association for Computational Linguistics.

Citations in Crossref