Konferensartikel

A preliminary constraint grammar for Russian

Francis M. Tyers
HSL-fakultehta, UiT Norgga árktalaš universitehta, Romsa, Norway

Robert Reynolds
UiT Norgga árktalaš universitehta, Romsa, Norway

Ladda ner artikel

Ingår i: Proceedings of the Workshop on “Constraint Grammar - methods, tools and applications” at NODALIDA 2015, May 11-13, 2015, Institute of the Lithuanian Language, Vilnius, Lithuania

Linköping Electronic Conference Proceedings 113:7, s. 39-46

NEALT Proceedings Series 24:7, s. 39-46

Visa mer +

Publicerad: 2015-06-17

ISBN: 978-91-7519-037-2

ISSN: 1650-3686 (tryckt), 1650-3740 (online)

Abstract

This paper presents preliminary work on a constraint grammar based disambiguator for Russian. Russian is a Slavic language with a high degree of both in-category and out-category homonymy in the inflectional system. The pipeline consists of a finite-state morphological analyser and constraint grammar. The constraint grammar is tuned to be high recall (over 0.99) at the expense of low precision.

Nyckelord

Inga nyckelord är tillgängliga

Referenser

Lene Antonsen, Linda Wiechetek, and Trond Trosterud. 2010. Reusing grammatical resources for new languages. In Proceedings of the International conference on Language Resources and Evaluation LREC2010, pages 2782–2789.

Kenneth R Beesley and Lauri Karttunen. 2003. Finitestate morphology: Xerox tools and techniques. CLSI, Stanford.

Nerea Ezeiza, Iñaki Alegria, José María Arriola, Rubén Urizar, and Itziar Aduriz. 1998. Combining stochastic and rule-based methods for disambiguation
in agglutinative languages. In Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics-Volume 1, pages 380–384. Association for Computational Linguistics.

Jan Hajic, Pavel Krbec, Pavel Kveton, Karel Oliva, and Vladimír Petkevic. 2001. Serial combination of rules and statistics: A case study in czech tagging. In Proceedings of the 39th Annual Meeting on Association for Computational Linguistics, pages 268–275. Association for Computational Linguistics.

Jan Hajic, Jan Votrubec, Pavel Krbec, Pavel Kveton, et al. 2007. The best of two worlds: Cooperation of statistical and rule-based taggers for czech. In Proceedings of theWorkshop on Balto-Slavonic Natural Language Processing: Information Extraction and Enabling Technologies, pages 67–74. Association for Computational Linguistics.

Péter Halácsy, András Kornai, and Csaba Oravecz. 2007. Hunpos: An open-source trigram tagger. In Proceedings of the 45th annual meeting of the ACL, pages 209–212.

Mans Hulden and Jerid Francom. 2012. Boosting statistical tagger accuracy with simple rule-based grammars. In Proceedings of the Eighth International Conference on Language Resources and Evaluation.

Janne Bondi Johannessen, Kristin Hagen, André Lynum, and Anders Nøklestad. 2011. OBT+Stat: Evaluation of a combined CG and statistical tagger. In Eckhard Bick, Kristin Hagen, Kaili Müürisep,and Trond Trosterud, editors, Proceedings of the NODALIDA 2011 Workshop Constraint Grammar Applications, volume 14, pages 26–34, Riga, Latvia. NEALT.

Janne Bondi Johannessen, Kristin Hagen, André Lynum, and Anders Nøklestad. 2012. Obt+stat: A combined rule-based and statistical tagger. In Gisle Andersen, editor, Exploring Newspaper Language: Using the Web to Create and Investigate a Large Corpus of Modern Norwegian, pages 51–66. John Benjamins Publishing.

Kimmo Koskenniemi. 1984. A general computational model for word-form recognition and production. In Proceedings of the 10th International Conference on Computational Linguistics, COLING ’84, pages 178–181, Stroudsburg, PA, USA. Association for Computational Linguistics.

Krister Linden, Miikka Silfverberg, Erik Axelson, Sam Hardwick, and Tommi Pirinen. 2011. Hfst-framework for compiling and applying morphologies. In Cerstin Mahlow and Michael Pietrowski, editors, Systems and Frameworks for Computational Morphology, volume Vol. 100 of Communications in Computer and Information Science, pages 67–85. Springer.

Igor Nozhov. 2003. [Morphological and Syntactic Text Processing (models and programs)] also published as [Realization of automatic syntactic segmentation of the Russian sentence]. Ph.D. thesis, Russian State University for the Humanities, Moscow.

Kemal Oflazer and Gökhan Tür. 1996. Combining hand-crafted rules and unsupervised learning in constraint-based morphological disambiguation. In Proceedings of the ACLSIGDAT Conference on Empirical Methods in Natural Language Processing, pages 69–81, Philadelphia, PA, USA.

Robert Reynolds and Francis Tyers. 2015. Automatic word stress annotation of Russian unrestricted text. In Main conference proceedings from NODALIDA 2015, Vilnius, Lithuania. NEALT.

Ilya Segalovich. 2003. A fast morphological algorithm with unknown word guessing induced by a dictionary for a web search engine. In International Conference on Machine Learning; Models, Technologies and Applications, pages 273–280.

Serge Sharoff, Mikhail Kopotev, Tomaž Erjavec, Anna Feldman, and Dagmar Divjak. 2008. Designing and evaluating a Russian tagset. In Proceedings of the Sixth Language Resources and Evaluation Conference, LREC 2008, Marrakech.

Atro Voutilainen. 2004. Hand crafted rules. In H. van Halteren, editor, Syntactic Wordclass Tagging, pages 217–246. Kluwer Academic.

Andrej Anatoljevi?c Zaliznjak. 1977. [Grammatical dictionary of the Russian language: Inflection: approx 100 000 words].

Citeringar i Crossref