Cleaning up the Basque grammar: a work in progress

Inari Listenmaa
University of Gothenburg, Sweden

Jose Maria Arriola
University of the Basque Country, Spain

Itziar Aduriz
University of Barcelona, Spain

Eckhard Bick
University of Southern Denmark, Denmark

Ladda ner artikel

Ingår i: Proceedings of the NoDaLiDa 2017 Workshop on Constraint Grammar - Methods, Tools and Applications, 22 May 2017, Gothenburg, Sweden

Linköping Electronic Conference Proceedings 140:3, s. 10-14

NEALT Proceedings Series 33:3, s. 10-14

Visa mer +

Publicerad: 2017-07-06

ISBN: 978-91-7685-465-5

ISSN: 1650-3686 (tryckt), 1650-3740 (online)


The first version of the Basque Constraint Grammar (BCG) was developed in 1995–1997 by two linguists (Aduriz et al., 1997) based on the Constraint Grammar theory of Karlsson (1990; Karlsson et al. (1995). Since then, it has undergone many changes, by many grammarians. During the two decades of development, the Basque morphological analyser has also been updated several times, and not always synchronised with the CG. As a result, the Basque grammar needs serious attention.

In the present paper, we describe the ongoing process of cleaning up the Basque grammar. We use a variety of tools and methods, ranging from simple string replacements to SAT-based symbolic evaluation, introduced in Listenmaa and Claessen (2016), and grammar tuning by Bick (2013). We present our experiences in combining all these tools, along with a few modest additions to the simpler end of the scale.


Inga nyckelord är tillgängliga


Itziar Aduriz, Jos´e Mar´ia Arriola, Xabier Artola, Arantza Diaz de Ilarraza, Koldo Gojenola, and Montse Maritxalar. 1997. Morphosyntactic disambiguation for basque based on the constraint grammar formalism. In Proceedings of Recent Advances in NLP (RANLP97).

Itziar Aduriz, Maria Jess Aranzabe, Jose Maria Arriola, Aitziber Atutxa, Arantza Diaz de Ilarraza, Nerea Ezeiza, Koldo Gojenola, Maite Oronoz, Aitor Soroa, and Ruben Urizar. 2006. Methodology and steps towards the construction of EPEC, a corpus of written Basque tagged at morphological and syntactic levels for the automatic processing. In Corpus Linguistics Around the World, volume 56 of Language and Computers, pages 1–15. Rodopi, Netherlands.

Izaskun Aldezabal, Olatz Ansa, Bertol Arrieta, Xabier Artola, Aitzol Ezeiza, Gregorio Hernndez, and Mikel Lersundi. 2001. Edbl: a general lexical basis for the automatic processing of basque. In IRCS Workshop on linguistic databases. Philadelphia (USA).

Eckhard Bick, Kristin Hagen, and Anders Nklestad, 2015. Optimizing the Oslo-Bergen Tagger, pages 11–19. Linkping University Electronic Press.

Eckhard Bick. 2013. ML-Tuned Constraint Grammars. In Proceedings of the 27th Pacific Asia Conference on Language, Information and Computation (PACLIC 2013), pages 440–449.

Nerea Ezeiza, Itziar Aduriz, I˜naki Alegria, Jose Mari Arriola, and Ruben Urizar. 1998. Combining stochastic and rule-based methods for disambiguation in agglutinative languages. In COLINGACL’ 98. Pgs. 380 - 384. Vol 1. Montreal (Canada). August 10-14, 1998.

Fred Karlsson, Atro Voutilainen, Juha Heikkil¨a, and Arto Anttila. 1995. Constraint Grammar: a language-independent system for parsing unrestricted text, volume 4. Walter de Gruyter.

Fred Karlsson. 1990. Constraint grammar as a framework for parsing running text. In Proceedings of 13th International Conference on Computational Linguistics (COLING 1990), volume 3, pages 168–173, Stroudsburg, PA, USA. Association for Computational Linguistics.

Inari Listenmaa and Koen Claessen. 2016. Analysing Constraint Grammars with a SAT-solver. In Proceedings of the 10th edition of the Language Resources and Evaluation Conference (LREC 2016).

Citeringar i Crossref