Using Finite State Transducers for Making Efficient Reading Comprehension Dictionaries

Ryan Johnson
University of Tromsø, Norway

Lene Antonsen
University of Tromsø, Norway

Trond Trosterud
University of Tromsø, Norway

Ladda ner artikel

Ingår i: Proceedings of the 19th Nordic Conference of Computational Linguistics (NODALIDA 2013); May 22-24; 2013; Oslo University; Norway. NEALT Proceedings Series 16

Linköping Electronic Conference Proceedings 85:10, s. 59-71

NEALT Proceedings Series 16:10, s. 59-71

Visa mer +

Publicerad: 2013-05-17

ISBN: 978-91-7519-589-6

ISSN: 1650-3686 (tryckt), 1650-3740 (online)


This article presents a novel way of combining finite-state transducers (FSTs) with electronic dictionaries; thereby creating efficient reading comprehension dictionaries. We compare a North Saami - Norwegian and a South Saami - Norwegian dictionary; both enriched with an FST; with existing; available dictionaries containing pre-generated paradigms; and show the advantages of our approach. Being more flexible; the FSTs may also adjust the dictionary to different contexts. The finite state transducer analyses the word to be looked up; and the dictionary itself conducts the actual lookup. The FST part is crucial for morphology-rich languages; where as little as 10% of the wordforms in running text actually consists of lemma forms. If a compound or derived word; or a word with an enclitic particle is not found in the dictionary; the FST will give the stems and derivation affixes of the wordform; and each of the stems will be given a separate translation. In this way; the coverage of the FST-dictionary will be far larger than an ordinary dictionary of the same size.


Lexicography; Computational Morphology; Orthographic Variation; Finite-state Transducers; Electronic Dictionaries


Antonsen; L. (2013). ?Cállinmeattáhusaid guorran. [English summary: Tracking misspellings.]. University of Tromsø.

Antonsen; L. and Trosterud; T. (2010). Manne dihtor galgá máhttit grammatihka? [English summary: Why the computer should know its Sami grammar.]. Sámi die¯dalaš áige?cála; 1:3–28.

Antonsen; L.; Trosterud; T.; Gerstenberger; C.-V.; and Moshagen; S. N. (2009). Ei intelligent ordbok for samisk. LexicoNordica; 16:271–283.

Beesley; K. R. and Karttunen; L. (2003). Finite State Morphology. CSLI publications in Computational Linguistics; USA.

Facebook-group (2012). Discussions in NSR – a Norwegian Saami Organisation’s facebook group. https://www.facebook.com/groups/norskesamersriksforbund/?fref= ts. [last visited on 25/01/2013].

Koskenniemi; K. (1983). Two-level morphology : a general computational model for word-form recognition and production. Helsingin yliopisto; Helsinki.

Larsson; L.-G. (1997). Prästen och ordet. Ur den samiska lexikografins historia. LexicoNordica; 4:101–117.

Lindén; K.; Silfverberg; M.; and Pirinen; T. (2009). HFST tools for morphology – An Efficient Open-Source Package for Construction of Morphological Analyzers. In Proceedings of the Workshop on Systems and Frameworks for Computational Morphology; Zürich; Switzerland.

Magga; O. H. (2012). Lexicography and indigenous languages. In Fjeld; R. V. and Torjusen; J. M.; editors; Proceedings of the 15th EURALEX International Congress; pages 3–18; Oslo; Norway. Department of Linguistics and Scandinavian Studies; University of Oslo.

Maxwell; M. and Poser; W. (2004). Morphological interfaces to dictionaries. In Zock; M.; editor; COLING 2004 Enhancing and using electronic dictionaries; pages 65–68; Geneva; Switzerland. COLING.

Moshagen; S.; Sammallahti; P.; and Trosterud; T. (2004). Twol at work. In Arppe; A.; Carlson; L.; Lindén; K.; Piitulainen; J.; Suominen; M.; Vainio; M.; Westerlund; H.; and Yli-Jyrä; A.; editors; Inquiries into Words; Constraints and Contexts; pages 94–105; Stanford; CA. CSLI.

Trosterud; T. (2000). Kåven; Brita E. (red) 2000: Stor norsk-samisk ordbok [book review]. LexicoNordica; 8:283–306.

Trosterud; T. and Eskonsipo; B. N. (2012). A North Sami translator’s mailing list seen as a key to minority language lexicography. In Fjeld; R. V. and Torjusen; J. M.; editors; Proceedings of the 15th EURALEX International Congress; pages 250–256; Oslo; Norway. Department of Linguistics and Scandinavian Studies; University of Oslo.

Citeringar i Crossref