Konferensartikel

Universal Dependencies for Afrikaans

Peter Dirix
University of Leuven, Centre for Computational Linguistics, Belgium

Liesbeth Augustinus
University of Leuven, Centre for Computational Linguistics, Belgium / FWO-Vlaanderen, Belgium

Daniel van Niekerk
North West University, Centre for Text Technology, South Africa

Frank Van Eynde
University of Leuven, Centre for Computational Linguistics, Belgium

Ladda ner artikel

Ingår i: Proceedings of the NoDaLiDa 2017 Workshop on Universal Dependencies, 22 May, Gothenburg Sweden

Linköping Electronic Conference Proceedings 135:5, s. 38-47

NEALT Proceedings Series 31:5, p. 38-47

Visa mer +

Publicerad: 2017-05-29

ISBN: 978-91-7685-501-0

ISSN: 1650-3686 (tryckt), 1650-3740 (online)

Abstract

The Universal Dependencies (UD) project aims to develop a consistent annotation framework for treebanks across many languages. In this paper we present the UD scheme for Afrikaans and we describe the conversion of the AfriBooms treebank to this new format. We will compare the conversion to UD to the conversion of related syntactic structures in typologically similar languages.

Nyckelord

Inga nyckelord är tillgängliga

Referenser

Liesbeth Augustinus and Peter Dirix. 2013. The IPP effect in Afrikaans: a corpus analysis. In Proceedings of the 19th Nordic Conference of Computational Linguistics (NODALIDA 2013) - NEALT Proceedings Series 16, pages 213–225, Oslo.

Liesbeth Augustinus, Peter Dirix, Daniel Van Niekerk, Ineke Schuurman, Vincent Vandeghinste, Frank Van Eynde, and Gerhard Van Huyssteen. 2016. Afri-Booms: An Online Treebank for Afrikaans. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), pages 677–682, Portorož. European Language Resources Association (ELRA).

Theresa Biberauer. 2003. Verb Second (V2) in Afrikaans: a Minimalist investigation of word-order variation. Ph.D. thesis, University of Cambridge, Cambridge.

Gosse Bouma and Gertjan van Noord. 2017. Increasing return on annotation investment: the automatic construction of a Universal Dependency Treebank for Dutch. In Proceedings of the Universal Dependencies Workshop at the 21st Nordic Conference of Computational Linguistics (NODALIDA 2017), Gothenburg.

Thorsten Brants. 2000. TnT – A Statistical Part-of-Speech Tagger. In Proceedings of the Sixth Applied Natural Language Processing Conference (ANLP-2000), pages 224–231, Seattle.

Jac Conradie. 2017. The re-inflecting of Afrikaans. Paper given at the Germanic Sandwich 2017, Münster.

Marie-Catherine de Marneffe and Christopher D. Manning. 2008. The Stanford typed dependencies representation. In Proceedings of theWorkshop on Cross-Framework and Cross-Domain Parser Evaluation at COLING 2008, pages 1–8, Manchester, UK.

Marie-Catherine de Marneffe, Bill MacCartney, and Christopher D. Manning. 2006. Generating Typed Dependency Parses from Phrase Structure Parses. In Proceedings of the 6th International Conference on Language Resources and Evaluation (LREC 2006), pages 449–454, Genoa. European Language Resources Association (ELRA).

Bruce C. Donaldson. 1993. A Grammar of Afrikaans. Mouton de Gruyter, Berlin/New York.

Kate Huddlestone. 2010. Negative Indefinites in Afrikaans. Ph.D. thesis, Utrecht University.

Joakim Nivre, Johan Hall, Sandra Kübler, Ryan Mc-Donald, Jens Nilsson, Sebastian Riedel, and Deniz Yuret. 2007. The CoNLL 2007 shared task on dependency parsing. In Proceedings of the CoNLL shared task session of EMNLP-CoNLL, pages 915–932.

Joakim Nivre, Marie-Catherine de Marneffe, Filip Ginter, Yoav Goldberg, Jan Hajic, Christopher D. Manning, Ryan McDonald, Slav Petrov, Sampo Pyysalo, Natalia Silveira, Reut Tsarfaty, and Daniel Zeman. 2016. Universal Dependencies v1: A Multilingual Treebank Collection. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), pages 1659–1666, Portorož. European Language Resources Association (ELRA).

Slav Petrov, Dipanjan Das, and Ryan McDonald. 2012. A Universal Part-of-Speech Tagset. In Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC 2012), pages 2089–2096, Istanbul. European Language Resources Association (ELRA).

Suléne Pilon. 2005. Outomatiese Afrikaanse woordsoortetikettering. Master’s thesis, North-West University, Potchefstroom.

Antal van den Bosch, Bertjan Busser, Sander Canisius, And Walter Daelemans. 2007. An efficient memorybased morphosyntactic tagger and parser for Dutch. In Computational Linguistics in the Netherlands: Selected papers from the Seventeenth CLIN Meeting, pages 191–206, LOT, Utrecht.

Daniel Zeman. 2008. Reusable Tagset Conversion Using Tagset Drivers. In Proceedings of the 6th International Conference on Language Resources and Evaluation (LREC 2008), pages 213–218, Marrakesh. European Language Resources Association (ELRA).

Citeringar i Crossref