Conference article

North-Sámi to Finnish rule-based machine translation system

Ryan Johnson
UiT Norgga Árktalaš universitehta, Giela ja kultuvrra instituhtta, Romssa, Norway

Tommi A Pirinen
Universität Hamburg, Hamburger Zentrum für Sprachkorpora, Germany

Tiina Puolakainen
Institute of the Estonian Language, Estonia

Francis Tyers
UiT Norgga Árktalaš universitehta, Giela ja kultuvrra instituhtta, Romssa, Norway

Trond Trosterud
UiT Norgga Árktalaš universitehta, Giela ja kultuvrra instituhtta, Romssa, Norway

Kevin Unhammer
UiT Norgga Árktalaš universitehta, Giela ja kultuvrra instituhtta, Romssa, Norway

Download article

Published in: Proceedings of the 21st Nordic Conference on Computational Linguistics, NoDaLiDa, 22-24 May 2017, Gothenburg, Sweden

Linköping Electronic Conference Proceedings 131:14, p. 115-122

NEALT Proceedings Series 29:14, p. 115-122

Show more +

Published: 2017-05-08

ISBN: 978-91-7685-601-7

ISSN: 1650-3686 (print), 1650-3740 (online)

Abstract

This paper presents a machine translation system between Finnish and North Sámi, two Uralic languages. In this paper we concentrate on the translation direction to Finnish. As a background, the differences between the two languages is presented, followed by how the system was designed to handle some of these differences. We then provide an evaluation of the system’s performance and directions for future work.

Keywords

No keywords available

References

Lene Antonsen and Trond Trosterud. forthcoming. Ord sett innafra og utafra – en datalingvistisk analyse av nordsamisk. Norsk Lingvistisk Tidsskrift.

Lene Antonsen, Trond Trosterud, and Francis Tyers. 2016. A North Saami to South Saami machine translation prototype. 4:11—27.

Chris Callison-Burch, Miles Osborne, and Philipp Koehn. 2006. Re-evaluation the role of bleu in machine translation research. In EACL, volume 6, pages 249–256.

Mikel L Forcada, Mireia Ginestí-Rosell, Jacob Nordfalk, Jim O’Regan, Sergio Ortiz-Rojas, Juan Antonio Pérez-Ortiz, Felipe Sánchez-Martínez, Gema Ramírez-Sánchez, and Francis M Tyers. 2011.

Apertium: a free/open-source platform for rulebased machine translation. Machine translation, 25(2):127–144.

Fred Karlsson, Atro Voutilainen, Juha Heikkilae, and Arto Anttila. 1995. Constraint Grammar:a language-independent system for parsing unrestricted text, volume 4. Walter de Gruyter.

Fred Karlsson. 1990. Constraint grammar as a framework for parsing running text. In Proceedings of the 13th conference on Computational linguistics-Volume 3, pages 168–173. Association for Computational Linguistics.

Kimmo Koskenniemi. 1983. Two-level morphology—A General Computational Model for Word-Form Recognition and Production. Ph.D. thesis, Department of General Linguistics. University of Helsinki, Finland.

Vladimir I Levenshtein. 1966. Binary codes capable of correcting deletions, insertions, and reversals. In Soviet physics doklady, volume 10, pages 707–710.

Krister Lindén, Miikka Silfverberg, and Tommi Pirinen. 2009. Hfst tools for morphology–an efficient open-source package for construction of morphological analyzers. In International Workshop on Systems and Frameworks for Computational Morphology, pages 28–47. Springer.

Sjur N Moshagen, Tommi A Pirinen, and Trond Trosterud. 2013. Building an open-source development infrastructure for language technology projects. In Proceedings of the 19th Nordic Conference of Computational Linguistics (NODALIDA 2013); May 22-24; 2013; Oslo University; Norway. NEALT Proceedings Series 16, number 085, pages 343–352. Linköping University Electronic Press.

Tommi A Pirinen. 2015. Development and use of computational morphology of finnish in the open source and open science era: Notes on experiences with omorfi development. SKY Journal of Linguistics, 28:381–393.

Ilnar Salimzyanov, J Washington, and F Tyers. 2013. A free/open-source kazakh-tatar machine translation system. Machine Translation Summit XIV.

Aaron Smith, Christian Hardmeier, and Jörg Tiedemann. 2014. Bleu is not the colour: How optimizing bleu reduces translation quality.

Aaron Smith, Christian Hardmeier, and Jörg Tiedemann. 2016. Climbing mount bleu: The strange world of reachable high-bleu translations. Baltic Journal of Modern Computing, 4(2):269.

Trond Trosterud and Kevin Brubeck Unhammer. 2013. Evaluating North Sámi to Norwegian assimilation RBMT. In Proceedings of the Third International Workshop on Free/Open-Source Rule-Based Machine Translation (FreeRBMT 2012), volume 3 of Technical report, pages 13–26. Department of Computer Science and Engineering, Chalmers University of Technology and University of Gothenburg.

Trond Trosterud and Linda Wiechetek. 2007. Disambiguering av homonymi i Nord- og Lulesamisk. Suomalais-Ugrilaisen Seuran Toimituksia = Mémoires de la Société Fi nno-Ougrienne. Sámit, sánit, sátnehámit. Riepmocála Pekka Sammallahtii miessemánu 21. beaivve 2007, 253:375–395.

Francis Tyers, Linda Wiechetek, and Trond Trosterud. 2009. Developing prototypes for machine translation between two Sámi languages. In Proceedings of the 13th Annual Conference of the European Association for Machine Translation, EAMT09, pages 120–128.

Linda Wiechetek, Francis Tyers, and Thomas Omma. 2010. Shooting at flies in the dark: Rule-based lexical selection for a minority language pair. Lecture Notes in Artificial Intelligence, 6233:418–429.

Citations in Crossref